ITIL can suck, but shouldn't

The pattern of my career over the past five years or so has involved moving newish, smallish internet/software companies to a post-startup hosting infrastructure. My past three companies were all small companies that developed and hosted internet applications, either for clients (in one case) or their own products. My role in each case was to move things to a more mature infrastructure, with configuration management, monitoring, directory services, and the other pieces needed to be able to manage a growing sprawl of servers and applications.

My focus has been much more on the technical than on the people processes for running and supporting the infrastructures. In my current job, the team I've brought in and the infrastructure we've built has reached a decent level, although there's certainly plenty more to do technically. But looking at what we've done and what I want to do next, I've realized thatrather than moving on and doing the same thing at another company, the more interesting challenge will be to take things to the next level.

The next level for me is going beyond the technical to focus on the people and processes. The technology infrastructure is going to grow in size and sophistication, including spreading to multiple data centres globally, but the technical challenges seem like more of the same to me. The challenges that seem newer and more intriguing to me personally are more along the lines of how the hell we're going to organize and coordinate people doing development, infrastructure, and support in three, four, or more countries.

So I went on a course in ITIL version 3. Yikes. ITIL is basically a blueprint for organizing a huge IT operation with lots of bureaucratic processes, forms, and signoffs that will make it nearly impossible to get anything done, and ensure that responsibilities are divided so that nobody who is doing anything productive sees the big picture.

I don't think it has to be this way. I actually did find the course useful, although not as useful as it could have been given that most of the people on it were more interested in ticking off the certification than getting ideas on how to improve the organisations they work at. There were some pretty interesting people there, some of whom were obviously interested in fixing real problems "back home". If the course had been more of a workshop where we shared war stories and ideas, it would have rocked.

A lot of the concepts in ITIL are useful, I think it's more a matter of using your head when applying them, making sure to adapt the ideas in ways that fit your needs and objectives. It's very easy to see how an organisations, especially large ones, take the ITIL material and use it to build horribly inefficient IT structures. I've worked with companies that use ITIL this way, and the course shed light on how they got this way.

The biggest problem with ITIL is that it's presented with clearly defined "phases" of strategizing, desiging, deploying ("transitioning"), operating, and improving IT services. This is an invitation to a waterfall model, where (as in at least one organization I know of) each phase can even be run by a completely different team of people.

So one group designs the service, hands it off to another than rolls it out (tests and installs it), and then hands off to a completely separate team that supports it. In the organisation I've encountered, the operations team hasn't got the vaguest clue about the service.

Of course the transition process involves "knowledge transfer" where the people who set up the service train the support team, but anybody who's done this stuff in the real world should know better.

Knowledge that is transferred in a handover process is never, ever, ever going to be learned as well as knowledge that comes from actually being involved throughout the whole process. Having some hands-off manager (ahem) overseeing things all the way through doesn't cut it, the people who will actually be diving into runtime problems with an application need to have gotten their hands dirty trying to install the application, and even have pitched into meetings where the details of how the application should be integrated into the infrastructure.

Otherwise, you're going to end up in the situation of my nameless organisation, one which is actually often held up as an exemplar of ITIL. They host an application on their servers, installed by the transition team, and their support team had training on how to log into the server and investigate problems with it. But when users call up with problems, the support people, who probably support dozens of applications, have forgotten all of this. They call up the software vendor - who have no access to the servers.

Can you imagine how incompetent your organisation looks when it's clear that your support people have no idea that the application they support is run by their own company?

But I do think it's possible to take many of the ideas of ITIL and apply them in a more agile manner. A bit of Googling shows I'm not the only one who thinks so, but that there doesn't seem to have been much work done on the idea, at least publicly. It's certainly something that would take a bit of thought and work.

My first thought, clearly, is that an agile IT services process would have to embrace the lean management principle of empowerment by having the "workers" (for lack of a better word) involved throughout the process.

I've also thought that the kanban approach to agile is paricularly suited to a sysadmin team, since it does away with the iterations/release cycle in favor of a queue of tasks that people pull from when they find they've got spare capacity.

Anyway, I'm looking forward to thinking this stuff through and trying out ideas over the next year. Although I'm going to be far less hands-on technically, my focus does need to involve a thorough understanding of the technical aspects of what we're doing, so I don't think I'm going to become a total suit.

SAN Virtualization

Virtualization in the server fram relies heavily on storage, something I understand only to a certain level. This
review by InfoStor discusses virtualization of the SAN itself to get the flexibility you really need to get the best value out of server virtualization. It's somewhat over my head at the moment - I haven't gotten to this point yet - so I'm putting this link here for future reference.

Found via virtualization.info.

Recommended blogs for virtualisation

I've added a few virtualization-oriented blogs to my feed reading. My favorite is VMUNIX Blues, by Mark Mayo. His posts aren't as frequent as the other two below, (although much more frequent than mine!) but the quality is high, he's doing hands-on stuff. I first found him by Googling for thoughts on shared storage approaches, which turned up this post about using NFS with vmware server.

Virtualization.info is by Alessandro Perilli, a consultant who specialises in this stuff.

VMblog seems to be driven by industry press-releases, but is nevertheless a worthwhile resource.

The hidden pitfalls of server virtualization

Where's there's buzz, there's bullshit. Slashing hardware costs and the time I spend herding servers makes virtualization sound like a silver bullet, but I have no doubt it's not as easy as the salespeople tell me to pull it off. I've spent some time researching and thinking it through, and have come up with a few things that I need to keep in mind when planning to get into virtualization properly.

Will I really spend less on hardware?

What's cool about virtualization?

How would you like to cut your annual server farm budget to a fraction of its current cost, reaping the glory and gold due to a corporate IT champion? Lop your data center to a third of its current size, no a fifth, maybe less! At the same time, you can improve resiliency, flexibility, and potency!

The Virtual Buzz

Virtualization is all the buzz these days, especially for server farms. As my own collection of server hardware heads towards 20 boxes and is still hard pressed to handle all the tasks I need it to do, I'm finding the lure of the herd hard to ignore.

Comments off

Grr. Turn your back on your blog for a minute, and it gets filled up with comment spam. Ok, maybe it's been longer than a minute.

Anyway, I've upgraded to a newer version of Drupal, and turned off comments for the time being. I had to switch off my old theme because it has issues. The current theme will probably stick around for a while. I do plan on tackling the comments though.

Whipping up a solid LDAP infranstructure

I've been much too quiet lately. I'm still hard at work putting together what I hope will be a very strong infrastructure for my company's application hosting operations, with about 15 servers for production, content management, and staging and testing.

One of the core components of this infrastructure is an OpenLDAP server, which I've been working on over the past week. Up until now it's been enough to have a couple of accounts which are created locally on all of the servers by puppet. I've got a chunk of disk space on a SAN which is shared across the machines, which is handy for having a common home area for key accounts I use to login and administer the machines, as well as the puppet templates and manifests.

The cool kids talk about operations

Tim O'Reilly, the boss of O'Reilly publishing and a key booster of the Web 2.0 meme, recently posted an article about operations.

One of the big ideas I have about Web 2.0 [is] that once we move to software as a service, everything we thought we knew about competitive advantage has to be rethought. Operations becomes the elephant in the room.

O'Reilly laments that most of the tools for deploying systems and applications on open source platforms (i.e. Linux) are not themselves open source. Luke Kaines and others have commented on the article with examples of open source deployment and operations management tools, including Puppet, and others I've mentioned for system configuration and network monitoring.

Network Monitoring Tools

This section has links and information about network monitoring tools. I've used Nagios a lot in the past 4 or 5 years, which is open source and pretty mature. It is mainly for detecting and reporting problems however, so it's useful to add something like Munin for tracking and graphing system resources and performance.

Another tool I'd like to try out for this is OpenNMS, which is written in Java, and includes the graphing as well as detection and reporting, and also auto-detects devices and services on a network.

Syndicate content