Playing with IPMIs and iLOMs, some impressions of.

Over the last few months I’ve had the dubious pleasure of playing with Integrated Lights-Out Management systems (ILOs, iLOMs and iDRACs).

I’ll admit that up until 3-4 years ago I used to think of ILOMs and IPMIs as little more than huge gaping security holes (From a similar POV as Bonkoski et al (https://jhalderm.com/pub/papers/ipmi-woot13.pdf), and similarly the US CERT advisory TA13-207A (https://www.us-cert.gov/ncas/alerts/TA13-207A)).  In point of fact, I’ve actually used Metasploit’s IPMI-related plugins and a few other tools to identify, map and track the use of the above in a couple of companies I worked in.

And I can tell you that unless you fence them in through layers and layers of DMZs, they are bloody dangerous. Because they give the intruder complete and total control of all the hardware (and I do mean all) in your bare-metal infrastructure. You can shut your server down, you can start it back up, restart it, reconfigure interfaces, get a TTY console, pretty much own the system at the hardware level.

NetSec (justified) paranoia aside, though, they are a very important resource in a SysAdmin’s belt, especially if the data-centre is on the other side of the country and you have to either travel like crazy throughout the night to get to it, or submit an SR for an engineer to go do some maintenance. And since the engineer won’t have the authority to mess around with the OS, you’ll need to keep an eye on the system and help them shut down the system and bring it back up safely, otherwise your hard drive change will turn into a P1 incident of the most painful sort.

There are as many different IPMI implementations out there as there are hardware manufacturers, and while they nowadays offer both a number of connectivity options, from serial console to telnet, to ssh to HTTPS web interfaces (which tend to be a PITA to sort SSL certificates for, believe me). But unless you work in an SME (which I am not) or are damn lucky, you will usually have to ssh/telnet (yes, they still use telnet!) to one, or even get the engineer to plug in a serial cable and get access via putty/minicom.

Now, the commands for doing things in an iLO/iLOM/ALOM/etc depend on what the hardware are, and they are usually buried in the hardware manufacturers’ documentation.

What I’ve found as a huge huge help is the cheatsheets that exist out there, for example the GeekDiary ones, found in http://thegeekdiary.com/most-commonly-used-ilom-commands-cheat-sheet/. They are invaluable for them nights where you’re on-call and you have to do something dangerous without having to shift through hardware manufacturers’ documentation sites and KBs.

Now, the thing to remember here, is that when doing anything more complicated that getting a console to the OS, you need to be aware of the state of the OS itself.

For example, if you have to do a hard-shutdown or cold-reboot of a system, its so, so damn easy to think you can just do a stop –force /SYS and then a start /SYS and still expect the OS to start nicely.

Reality is, however, somewhat different. A system crash or kernel panic can have one or more underlying causes, especially in systems with an uptime measured in years rather than weeks. And in such a case the problem will either be made worse by a hard-shutdown/cold-reboot or will result in a corrupt OS well beyond the abilities of an fsck to sort out.

A well-maintained OS on the other hand will, more often than not, come back just fine, or with just minor issues that an fsck will be able to fix, but again that’s debatable.

So, if you start playing with IPMI-related systems, remember the “don’t try this at home, kids!” warning older TV show presenters used to say, and be careful what you do.

That’s all for now, folks, so g’night, and stay safe!

This entry was posted in Network Engineering, SysAdmin Tips & Tricks. Bookmark the permalink.

Comments are closed.