After the Storm: Powering up the Virtual Data Center

There is something to be said for colocation and cloud services, where organizations keep their data and maybe even physical servers in remote facilities that share backup generators, redundant air handlers, multiple network paths, and distributed systems with fail-over redundancy.  But, ultimately, unless you are a true road warrior and connect  wirelessly from coffee shops and airline waiting areas to your exclusively cloud-based or colocated resources, you will need to manage your own network in the face of long-term power outages.

Here at Chaos Central, we have the usual UPS units lurking under the desks, which we largely ignore until the power goes out and they start beeping.  The home office/small office versions of these small units that more or less promise Uninteruptible Power to your computers are best for those momentary power glitches that plague any power grid during uncertain weather or simply human error at the control center: the lights blink, the power supplies beep, and computing goes on.   During winter ice storms and summer heat waves, when everything goes dark for minutes or hours, one of two things happens:

Ideally, you have a flashlight nearby so you can see the keyboards of the servers and workstations well enough to save your current work and run through shutdown cycles (for those machines that don’t have software wired to the power supply that automatically do this) before the batteries run down.  If you have done your system design correctly, you have purchased enough capacity to run the systems for five to fifteen minutes on battery, long enough to shut down the systems gracefully.

On the other hand, if you haven’t provisioned properly, or if you have not paid attention to the age of that black or beige box under your desk, the systems will shut down ungracefully, sometimes in mid keystroke.  Batteries need to be replaced every 3-5 years: the units themselves continue to improve, and the electronics also have been known to fail, so replacing the entire unit is sometimes easier and not that much more expensive.  A good rule of thumb is to get a new UPS when you buy a new computer.  Laptops have a built-in UPS, the battery, so all you need is a surge protector.

In either case–orderly or disorderly shutdown–in an extended outage, the UPS units need to be turned off and everything gets quiet.  When the power is restored, it all needs to be turned back on manually.  (Tip:  if you work from a home office and travel often during “the season,” it is a good idea to have a “network sitter,” someone you trust who can go to your house and turn the critical systems back on after an outage, if you need remote access to your network: we’ve been left “out in the cold” several times over the years, and, yes, have had a network sitter from time to time and it has paid off).

At Chaos Central, some systems come on with the line power, but some need to be manually started as well.  Our main server is a Citrix XenServer, which hosts a variety of systems, some of which are used for network services and some for experiments and development projects, so we leave those to be manually started from the XenServer console.  But, the NFS shares and system image shares need to be connected from the XenCenter GUI, which only runs on Windows.  We keep a Microsoft Windows XP image (converted to VM from an old PC that now runs Linux) in the virtual machine stack for that, but, in order to use it, we have to attach to it from a Linux system that runs XVP.  Finally, after all the network shares, DNS server, and DHCP server are started, we can boot up the rest of the VMs and physical machines.

The next step in the process is to prime the backup system.  Here at Chaos Central, we use rsnapshot to do backups, with SSH agents to permit the backup server to access the clients.  The agent needs to be primed and passphrases entered into it.  The agent environment is kept in a file, accessed by the cron jobs that run the backup process.

ssh-agent > my_agent_env; source my_agent_env; ssh-add

starts the agent and puts the socket id in a file, then sets the environment from the file and adds credentials to it.  Previously, each backup client has had the public key installed in the .ssh/authorized_keys file.  And, of course, since we use inexpensive USB drives instead of expensive tape drives, we need to manually mount the drives: those things never seem to come up fast enough to be mounted from /etc/fstab.   There is a setting in /etc/rsnapshot.conf to prevent writing to the root drive in  case you forget this step…

We also permit remote logins to our bastion server so we can access our files while on travel.  After the system has been off-line for a while, we can’t count on getting the same IP address for our network from our provider, so we have a cron job in the system that queries the router periodically for the current IP address, then posts changes to a file on our external web server shell account.  This also requires setting up an SSH agent.

Now, we are all up and running, usually just in time for the next power outage: here in the Pacific Northwest, winter ice storms usually result in several days of rolling blackouts.  Yesterday was up and down, today has been stable, except for a few blinks that don’t take down the network, so the backups are running and all the services are up.  Our son and grandsons have arrived with a pile of dead computers, cell phones, and rechargeable lighting, having endured two days of continuous blackout in their larger city.  The network is abuzz as everyone jacks in  to catch up on work, news, mail, etc.

It’s a lot of work to run a multi-platform network in a home office, but worth it, when you consider that we didn’t have to scrape snow and ice off the car, shovel the driveway, or brave icy streets and snarled traffic to get to “the office.”

The benefits of telecommuting outweigh the chore of keeping up the network.

Why Snow Paralyzes Puget Sound

Note:  This was written on battery, in the dark, during the obligatory post-snowfall power outage at Chaos Central.

The folks who live on the Great Plains and along the Great Lakes snicker a bit at news reports on the mass closures in Seattle when snow comes to the the Puget Sound region. Of course, the standard excuse is, “It doesn’t snow that much here, so we don’t have enough snow removal equipment.” But, wait, there’s more…

Most of us live in the Pacific Northwest because of the geography and the climate. The climate is temperate, with few days below freezing in the winter and few days in the 90s in the summer. Most of us live within 20 miles of either salt water or a fresh-water lake. If you want snow, you can find it in the nearby Olympic or Cascade mountain ranges, 12 months out of the year. The rest of the year, the lush green mountains invite backpackers, the flat river valleys attract bicyclists, lakes and estuaries fill with kayakers and canoeists, rivers with rafters, and the larger lakes and the Salish Sea (the accepted name for the inland sea that stretches from Olympia, Washington to Prince Rupert, BC, bounded by the Olympic Peninsula and Vancouver Island) fills with sailboats and motor craft of all sizes.

For all of this, we are willing to put up with the moisture that is a natural consequence of air warmed by the Pacific ocean currents flowing up the coast from the south. Rainwear is a fashion statement, and umbrellas are considered a mostly ineffective and cumbersome nuisance in a land where the average day ranges between swirling mist that clings to everything and fire-hose-like sideways blasts with downpours so heavy that motorists have to brake for migrating salmon. And, for a few days every odd year or so, cold and wet coincide in a heavy snowfall like those that come in waves winter-long in places like Minneapolis and Chicago. But, this is not the midwest.

Double-digit grades and ice do not mix: the street in front of Chaos Central

First, because of the geography: the region is characterized by foothills of the mountain ranges that frame the inland sea. The foothills give way to steep bluffs of glacial till that fall off sharply into the river valleys and sea, up which rise city grids on double-digit grades more suited to Olympic ski jump ramps than walking, driving, or pedaling. And second, because of the climate: the temperate nature of the climate dictates that, when the temperature does dip to the low end of the range, it hovers back and forth across the freezing line. The mass of the inland sea also helps moderate the temperature: when the wind is calm, the snow level is measured in not less than a couple hundred feet above sea level. But, the wind is never that calm, which means the rain that falls on the lower elevations, itself barely melted snow, falls on frozen snow, and therein lies the root of the problem.

Our camellia bends over under the weight of clear ice on top of snow

Steep slopes plus ice plus a preponderance of evergreen foilage combine for interesting times. The foilage includes firs, pine, and cedar, which are better-adapted to snowy conditions but less so to strong winds, since the wet climate promotes shallow root structures. Other temperate evergreens include broad-leaf plants like the native rhododendron, holly, and other ornamental shrubs found in cities. Deciduous trees in cities tend to become infested with English ivy and other climbing evergreen vines, which make them vulnerable to wind and snowload as well.

So, in the aftermath of one of these snow episodes, the snow plowing efforts, inadequate to begin with, leave many streets unplowed and packed down by adventuresome traffic. Trees are heavy with snow. And then, it rains. At first, since snow is an insulator, the rain simply combines with the snow to form a heavy crust of ice on top of a fluffy layer of loosely-packed flakes. This results in two different effects: first, the leaves and needles of the trees hold the snowballs in place until they become ice balls, at which point the tree either breaks apart or falls over, usually on a power line. Buildings designed for “normal” snow loads may collapse as well as the ice builds up. Second, on steep slopes where there are no trees (because this happens often enough to discourage their growth), the weight of the ice overcomes the integrity of the fluff beneath, and the whole mass heads downhill, picking up speed and mass as it goes, sweeping everything and everyone before it, often ending up in a solid mass of lumpy ice and boulders across a major highway.

While major power transmission lines have usually been built well clear of historic avalanche zones, the metropolitan and rural power grids are not so fortunate, being in the midst of so many snow and ice-laden trees: power outages soon follow, lasting anywhere from several hours to several days, depending on the distance from the substation to the consumer. Getting to the site of the outage to repair the damage or getting to a location where there is power is also problematic. Where the roads have been plowed and walkways shoveled, the rain freezes directly on the surface, creating a virtually frictionless surface not suitable for either walking or driving. New power outages crop up where cars skid off icy roads into power poles or trees.

Meanwhile, the temperature continues to rise, and the snow level moves higher, leaving behind a slushy mixture that begins to be more water than ice. The water moves downhill: the ice holds it back. Water backs up into basements and garages and fills dips in streets and roadways. Then, it all ends up suddenly in streams and rivers, which quickly overflow onto low-lying areas that may have been spared the heavy snow and ice loads but now find themselves underwater.

If the snow load was large enough and the rains continue, the clay bluffs continue to absorb water until they, too, become too heavy and slippery, and the hillsides collapse into the bays, taking trees, houses, and roads with them. Fortunately, the combination of wet and cold winters happen only once a decade or two in this area, so people rebuild and forget the troubles by the next sunny day when the “mountains are out” and they head for the beaches, marinas, backroads, or trails.

Teaching New Dogs Old Tricks: Ubuntu for Unix hacks

Here at Chaos Central, we’ve been wildly excited about our new crop of Ubuntu 11.10 Linux computers from Zareason, the arrival of which was discussed here a couple of weeks ago.  But, thrilled as we are with the speed and capacity of these boxes, there is a learning curve–for the box, not the user.  Unix, and, by extension, Linux, has been evolving for more than 40 years now, and has a huge library of Useful Things accumulated, and a history of competing distributions, each with its own “flavor” and set of favorite tools.

In the beginning, Unix was a toolbox for scientists and engineers to build things quickly and cheaply: initially, document processors, and utility software, and, later, the Internet itself.  The growing popularity of Linux, and particularly the Ubuntu distribution, has driven the need to make it useful for the things “ordinary” people–i.e., those who don’t make a living from tweaking the innards of Big Iron computing–like to do, like surf the net and manage their music, video, and image collections.  And, since Linux is still what to do with your old PC when it gets too bloated with malware and  spyware, the popular distributions still need to fit on a CD.

Most industrial-strength distros now come on DVDS, sometimes more than one, containing the entire Linux collection of free software. but, for the masses, armed with CD-only PCs, something has to give, which often are the venerable, legacy Unix features that most home users will never need.  But, many of those are what we’ve been lugging around on our hard drives for 20 years or more:  venerable editors like emacs and vi, superceded by menu-driven simple editors or integrated graphical development environments but still more powerful and with more features than anyone will ever use, and for which the ones I’ve learned, are extremely useful and well-practiced enough to be second nature, so I keep using them, even if they don’t come with the system anymore.  And, more recently, enterprise-level tools for building massive compute clusters, like GridEngine and MPICH2, along with software development libraries and specialized utility libraries for science and engineering.  We also need a lot more development tools than come with a standard desktop, since we develop software for the web and for high-performance computing clusters.

Fortunately, the Ubuntu software repositories have a lot of those tools packaged up and loadable from the Software Center application, so we don’t need to go through the ritual of downloading, unpacking, configuring, compling, and installing nested sets of dependent programs like the old days.  But, our old computers have accumulated a unique set of software over the four or five years of their busy lives, so the new ones have a lot to learn:  we first had to load the no-longer-included Synaptic Package Manager to grab some of the software libraries and utilities not available in the Software Center catalog.  And, of course, get rid of that silly Unity desktop that only works well for folks who only do one thing at a time with their computers.  We have to have lots of toolbars visible and lots of  workspaces to which we can jump with a single click, which Gnome gives us.

Not surprisingly, some of the more esoteric and least-used packaged software still have a few surprise unresolved dependency issues.  To my delight, GridEngine, a distributed job control system for compute clusters created by Sun Microsystems a dozen or more years ago, was available in the Software Center.  Since Oracle bought Sun a couple years ago, a lot of these tools have disappeared off the free download list at Oracle, folded back into the supported product lines, and the old packages are sometmes hard to find.

GridEngine is one of those transitional systems that, unlike the new applications designed to run under Gnome or KDE desktop management systems, was born and developed during the days of OpenWindows (contemporary with Microsoft Windows 2) and the Common Desktop Environment (CDE, which predates Windows 95), both high-end X-Window System desktop managers in their day.  X11 programs used to be a lot harder to write, and designed for networking more than having the client and server on the same workstation, so they tended to leverage the Unix philosophy of lots of little programs, each doing one thing well, working together, much more than the more integrated and abstracted desktop applications of today.

GridEngine is more likely to be installed on an Ubuntu machine as a client or, at most, an execution node in an ad hoc cluster, rather than a master host, so would not usually run the graphical grid manager, qmon.  But, Open Source being what it is, the whole package is available.  However, the “just works” philosophy of Ubuntu breaks down here, as the dependencies of the archaic and arcane OpenWindows flavor of the graphical component aren’t checked very thoroughly, and there is a bit of a problem.  The application depends on the X11 font server, a client-server application designed to facilitate running X11 clients on a server and displaying them on a different X11 server that might not have all of the requisite fonts loaded.  Also, because CDE relied heavily on licensed Adobe Truetype fonts, the chain of dependency gets broken when it comes to fitting old non-GPL’d software into a Linux distribution.

When you get a GNU/Linux system distribution, everything in it is licensed under the GNU Public License.  You can install anything you want in addition to that, but you can’t package the extras and redistribute it  This also extends to the packaging system.  The Ubuntu Software Center has provisions for adding non-free (i.e., non-GPL) software repositories, but they aren’t going to intereact with each other, so complex packages like GridEngine, that depend on non-free components, come with “some assembly required.”  In my case, someone had already solved the problem for Ubuntu, so a Google search turned up a list of the missing packages and how to install them.  I’ve used GridEngine for years, but on Solaris and RedHat Linux systems.  Solaris, of course, was licensed from Sun (now Oracle) and had full support.  The old Sun GridEngine for Linux packages came with the non-free fonts and dependent packages integrated, because you got them from Sun–they weren’t on any of the five or six CDs (now two DVDs) that comprise the full Red Hat Enterprise Linux (or, as many of us who don’t need any hand-holding support from Red Hat use–CentOS).

So, at Chaos Central, the new dogs are gradually getting housebroken and have largely quit chewing on the furniture, i.e, have learned enough so they–to use another analogy, don’t respond–like HAL9000 from “2001: A Space Odyssey,” “I’m afraid I can’t do that…” when asked to perform tasks the other computers have been doing for years.  They’ve even learned, with larger memory and faster processors, to do new things.