It’s probably been fifteen years since I realized there’s a distinct difference between things I can do well and things that are good for me to do. It’s probably been almost thirty years since I realized I wasn’t like some of my IT buddies. Both of those aphorisms are coming home to roost this week.
IT has always been full of new things. Some are new ideas, some are new implementations of existing ideas. Some work out. Some are before their time. And some die a well-deserved death. Merit isn’t always a factor in what succeeds and what fails; just ask any Betamax owner.
Some IT workers are constantly chasing the next big thing, so they can be an expert in it when (if) it succeeds. They want the credit for being an early adapter. Sometimes, they merely want to get a proof of concept working for one test case and then move on to the next big thing after that, leaving someone else to work with the seventeen other test cases that should work but don’t.
I’m different. I’ll wait for the market to sort itself out and then let someone else tell me what to work on. I can build a data center from scratch, and sometimes it’s nice to do that so you don’t have to work around others’ design choices. (“3310s? What idiot chose 3310s over 3370s?”) The real fun comes for me, though, when a project moves out of design and implementation and into use. I love that last 1% of the optimization curves. It’s easy to get something to 90% or 95% of potential; you plug it in, you wire it up, and, boom! You’re at 90% of potential almost immediately. But then you start to find stuff. “I thought it was supposed to be able to…” “Gee, that’s not the way I expected to have to line things up for them to work.” “The salesman said that process would take 40 minutes, not 48 minutes. Why is that?” That’s where I excel. Why might that not work? How else could that sales promotion line be read if it doesn’t mean what we thought it meant? Was the salesman exaggerating, or do we have a configuration mistake? Why the heck do they have this option, and would we ever need it?
Sometimes, I’ve had to work with something that would eventually be abandoned for good cause. (Go ahead, ask me about the EMC 3D 4000 add-on to the Clariion Disk Libraries. Be ready to bribe me with lots of ice cream to get the full story.) Most of the time, though, I’ve been able to coax a little more out of something that people expected. When I joined the TSM team at NIST in 2006 or so, back around TSM 5.3 or 5.4, I asked if they had tried collocation by group. They hadn’t, so they weren’t using collocation at all; even LTO-2 cartridges were too large to dedicate to one client at a time. I set out to figure out who could be grouped so we’d have a group of clients that would fill two or three cartridges at a time, and as a result any one client’s data was on relatively few cartridges, but most cartridges were closer to full than empty, whether they were filling for the first time or their data was aging off after the tape had been filled at one point.
That’s what I do: figure out how something can work better. Not working at all? Can it be fixed, and if so, how? Working, but not well? Why not? Poor configuration, or just a poor product? And when something settles down and seems to be working well, I watch for anomalies and chase loose ends.
I remember once noticing that a TSM server was calling for a two-year-old tape to be mounted. Why? And what was on that tape? DB2 backups from two years ago. Why did we have two-year-old DB2 backups, and how many other tapes had only database backups that no one in their right mind would ever load back into a production database server? Within three months, we had purged more than 75% of the backup data written by various database servers and put the fear of Nick into our DBAs so that they started managing their database backups. That in turn let us retire a physical tape library a year earlier than expected, because the amount of data we had to copy to a new virtual tape library was suddenly much, much smaller than it had been before. Nothing was overtly broken, but I noticed an anomalous behavior and followed it into the rabbit’s hole.
Right now, I’m supposed to be arranging proof-of-concept demonstrations of backup systems from three different vendors (or partnerships, in one case) in advance of choosing equipment for our next data center. I hate this. For one thing, at this point, it’s all paperwork. Design some tests. Work with the vendors to agree on some dates. Decide upon test criteria. Do we have performance requirements? OK, fine, now, do we have realistic performance goals? (Is our current system being blamed for not having met performance expectations that were never realistic?) Can I get this team to commit to testing whether their application is correctly backed up by this new method? Do I know how much time we need? Does their estimate expect me to multiply proposed durations by three, so I really should multiplying by six? (Nothing every goes as quickly as planned, and if you fudge your numbers so my adjustments bring me back to your “real” numbers, well, your real numbers still aren’t real, hence multiplying by six, not by three.) Do I really need to get three times as much rack space so I can have three sets of competing products racked at once, or are we going to do this in serial fashion and risk one product being held up because another needed “just a little more time”? For that matter, will I be available as projected, or will I be called away due to day-to-day surprises and thus be the reason we don’t complete each test?
I’m good at things when I understand their purpose and how the pieces fit together. I’m not nearly as good at looking at someone else’s concept and knowing how to proceed, unless that person and I have a lot of shared history. Someone else who has the same problem but doesn’t realize it might rush off and happily implement something that the original person wouldn’t realize was related to their concept. On the one hand, I don’t do that. On the other hand, I worry myself sick about ambiguities I recognize or requirements I don’t understand. (“Why does this have to accommodate left-handed Lithuanians? Why Lithuanians? Why left-handed? Or am I misreading someone’s cocktail napkin scribblings?”)
If I was still working for the big orange home improvement store, I’d turn to one of my colleagues and let him go blue-sky the PoCs while I took his turn at making sure the trains run on time and the freight all gets delivered. He loves the design phase but slogs through operational issues. I’m not at BOHIS, though. I’m not part of a team of three who only do one thing, backups. Do I trust anyone else to do this well? If not, why bother with a PoC?
So, I’m doing things because I’m the best person to do them, not because they’re the best things for me to do. Three months from now, or six months from now, the dust will settle and I’ll be back squeezing the last 10% out of whatever system we chose. But between now and then, too often I’ll look up from some spreadsheet or some test plan at a display of how the current backup system is running and yearn to be back chasing some anomaly or squeezing a little more performance out of it instead.
Things I’m good at. Things that are good for me. The latter is probably a subset of the former (or at least mostly), but too often I’m working in the former in a section that isn’t always the latter. I guess that’s part of being a team player, but that doesn’t make me like it.