The Uncaptured Bug

As I wrap up my work with TSM, there’s one bug that got away from me. If someone gave me three months with access to the source code for the TSM server, it’s the bug I’d try to track down.

TSM allows for different policies about how it fills tapes with client data. It can dedicate a tape to a specific file system from a specific client (“collocation by file system”). It can dedicate a tape to a specific client (“collocation by node”). It can dedicate a tape to a group of nodes grouped by the TSM administrator (“collocation by group”). Or, it can by policy put anything on a tape.

The more specific a tape’s policy is, the faster data for that group/node/file system can be restored. The counter-point is, the more restrictive the policy is for a group of tapes (a “storage pool”), the more tapes you’ll need — and the less occupied the average tape will be. If tapes are small, that’s no big deal — but your tape library, if physical, might be very large. If your tapes hold one Terabyte each, you might not want it to be 98% empty. You might not even want it to be 50% full. But, remember, if a client’s data is spread over many tapes, doing a full restore of that client might be a lot slower than it has to be. This is why TSM has several levels of granularity and also why a TSM system can have several storage pools, each with its own collocation policy.

TSM allows administrators to change a storage pool’s collocation policy. If an administrator decides the current collocation policy was too profligate with tapes, the administrative can adopt a more conservative policy for one or more storage pools. This is where I think my great white whale of a bug lives.

If a storage pool’s policy is not to collocate data, any tape with room on it can receive any client’s data, in theory. None of the commands or queries about TSM’s internal database tables allude to the collocation policy under which a tape was first used, so there isn’t any documented reason why TSM shouldn’t use a tape under a new policy as the new policy allows. Yet at least twice in my career, I’ve partially filled tapes remain only partially used after a collocation policy is changed to the least-restrictive, no restrictions at all, while new tapes are added to the storage pool in question. It’s as if TSM “remembers” that a tape has to be used one specific way, even if no new tapes are being used that way.

I opened a case with IBM about this probably three years ago, but I couldn’t convince them that it was happening. Now that I’m leaving the world of TSM, I probably never will.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s