SUNset: The End of an Era

The last of our SUN SPARC machines heads for the recycle center

August 15, 2012 — The last of our SUN SPARC machines headed for the recycle center today. Woodrow, the Ultra 30, had been “cold iron” since we moved to Chaos Central three years ago, but Xavier, the Sun Blade 100, had been running until recently, when we extracted the disk drives, one of which lives on as an external drive on Zara, our main Linux desktop workstation. The two machines, along with xavier’s CRT monitor–also still operable, but with a resolution only supported by Solaris–were accompanied by five defunct UPS units deemed either too obsolete to bother replacing the batteries or which had failed electronics.

This weekend, the Unix Curmudgeon is scheduled to migrate the last Solaris SPARC production system to a new Linux virtual machine at our major client site, which will effectively end 22 years of computing on the venerable RISC platform. While the SPARC line of CPUs lives on under Oracle’s mantle, and the Solaris operating system (Intel version) still lurks as a client on one or more virtual machine hosts, SUN Microsystems is now only a footnote in the history of computing. Our association with SUN–which was one of the first Silicon Valley startups, emarging from the Stanford University Network (SUN), in 1982–began in the spring of 1990, with a SPARCstation 2 running SunOS4, in a project room at Seattle University. The machine was part of the development of the Proteus project, an early compute cluster, for which our Master of Software Engineering project team was building the operating environment.

After our portion of the Proteus project was completed (a spectacular demonstration of parallel programming in a simulation of the hypercube-connected 1000-CPU cluster), my next encounter with SunOS and SPARC and introduction to Solaris was when I took a contract assignment as system administrator for the U.S. Army Corps of Engineers, managing a SPARCstation 20 running SunOS4, and Sun Netra and Ultra 2 systems running Solaris 5.5.1.  When I moved on to Concurrent Technologies, which had similar hardware, including a Solaris 6 system, I picked up a refurbished SPARCstation 20 for my home office and loaded Solaris 7 on it.  At the end of my assignment with CTC, I installed a pair of Ultra 10 workstations as servers in a co-location site in San Diego. At the University of Montana, I installed a Sun Enterprise 250, then moved on to the NIH’s Rocky Mountain Laboratories, which had a pair of Sun Enterprise 450 servers, which were in turn supplanted by an Sun Enterprise 880V, which I later replaced with a Sun Ultra Enterprise T5220, the machine which is being taken out of production, having been displaced by Linux clusters and VMs.

Meanwhile, in my home office, I had added the Sun Blade 100 and replaced the SPARCstation 20 with the Ultra 30.  At “the Lab,” I also had used a Blade 100 as a workstation and replaced it with an Ultra 20 (new generation, not to be confused with the old Ultra line), and had tended a group of servers and workstations, an Enterprise 220R and several Ultra 2000 and 3000 workstations, that had been “donated” from projects at the main NIH campus in Bethesda.  Most of the SPARC machines were simply “retired,” as they were replaced with larger systems or passed beyond their supported life spans and replaced with newer models, or simply outlived their projects.

The S20 and U30s in my own office were the only ones that succumbed to “natural causes,” having expired from running too long in un-airconditioned spaces (S20) or being powered down too long (U30).  It was hard to let any of these faithful workhorses go, but eventually one must make room for new technology and move on.  It is doubly sad that the iconic SUN logo will no longer grace any of the systems in the stable, either at Chaos Central or the client sites.  After 22 years of seeing the familiar blue-and-beige boxes at home and work, we will miss them.

Drivers Wanted — Call 1-888-GOT-UNIX*

*not a real phone number

The English language has been characterized as one that not so much borrows from other languages, but outright steals, remakes, and repurposes words to suit one or many purposes. So, the “Drivers Wanted” signs on the back of 18-wheeler trailers has different meaning to Unix system administrators. But, just as the long-haul trucker operates the tractor that gets your load hauled to where it is needed, a Unix device driver is a piece of software that gets your data to and from an external device, like a hard disk, dvd, printer.

Recently, Chaos Central experienced several close encounters of the frustrating kind with device drivers. Unix device drivers have always been a challenge, but the growth of open source software has made them even more so. Device drivers have traditionally been provided by the manufacturers of computer equipment, either through iron-clad non-compete agreements with commerical Unix vendors (like Oracle and Apple) for source code or operational specification for the device or in binary form in the case of Windows computers. The problem arises with Linux: in order to be distributed with the GNU open-source licensing, all components must be open source and freely redistributable, and many device drivers are not.  At best, drivers come as object files and header files that can be linked with the kernel version at the current patch level.

Device manufacturers develop and deliver driver software in binary form for Microsoft Windows, because they sell most of their products for use with computers running Windows. However, since the hardware is proprietary and the software required to interface with it reveals much of the inner workings and design, vendors are reluctant to release the source code to Linux developers. Also, because the Linux market is relatively small, they are also reluctant to produce binary versions compatible with Linux. In cases where there are binary drivers for Linux, the drivers are specific to kernel revision. Since the Linux kernel is frequently patched and has at least a minor revision release every six months, keeping a complete configuration inventory is problematic, even for systems that are supported, which brings us to the first pothole in the information highway…  And then, there is the problem of keeping drivers current with firmware changes in the hardware.  The devices themselves are often programmable circuitry, with flash memory.  Changes to firmware usually also require changes in the host driver.

Fiber Optic/Multipath Disk Drivers

It is no secret by now that the majority of Internet servers and almost all high-performance computing systems run Linux. High-end server hardware is most often attached to high-end storage systems as well, most of which achieve high throughput by using fiber optic connections. So, most fiber channel Host Bus Adapters do have device driver support for Linux, either available directly from the vendor or, in the case of supported Linux distributions, as part of a paid service package from the Linux provider. But, these choices also present what is basically a fork in the configuration management of the drivers: for instance, the Red Hat version does not always match the QLogic version. The reason for this is complex, but usually the hardware vendor wants to keep the drivers and firmware as up-to-date as possible to correct problems and support newer platforms and storage devices. The operating system vendor wants to keep older systems running smoothly, as well as supporting newer systems. Both need to balance the cost of maintenance against the number of installed systems. Inevitably, combinations of hardware and software exist that are not supportable, or that require some additional work by the system administrator to maintain a working system.

Thus, it came to pass that, during a routine patch-level upgrade to bring a pair of systems to the same Red Hat 5.8 level, a warning popped out on the console, announcing components of the fiber-optic driver were missing. Fortunately, Linux systems are built in such a way that the system software libraries maintain compatibility with all patch levels of a particular kernel release: Red Hat does not update the kernel release through a major distribution release, so it is possible to update the system software without updating the kernel patch level, so that the drivers do not need to be rebuilt. It was a simple matter of editing the grub.conf file to reboot using the old kernel version (which is why Linux retains the older kernels in the Grub menu). Even though most device drivers are now loadable at run time, if they are needed during the boot process, they must be integrated into the boot image. In the case of these machines, the boot disk appears to be local, but the driver is needed to mount the data drives for the server application. Being able to build a boot image with the driver is usually a pretty good indication that the driver will load when needed.  Also, many high-end systems use multiple paths between the host bus adapter and the storage array, for both fail-over and to increase throughput.  Multipathing needs to know what devices support it, so even if the system will boot without the device driver, multipathing may not work.  In most Linux systems, the device mapping will differ between multipath and non-multipath configurations, so the correct devices will not be available if the drivers are loaded in the wrong order.

But, it is desirable to have the kernel patches installed as well, so the problem of building a boot image with the driver installed had to be solved. There was a utility package for rebuilding the current image, but that presupposes that the system would actually reboot without the driver installed. A bit more research turned up a manual procedure for building a boot image from any kernel, which appeared to work. To complicate issues, this upgrade was for a remote client, so standing at the console to select a fall-back kernel in case the reboot was unsuccessful was not an option. In these cases, it is necessary to have someone standing by at the physical machine to intervene if necessary during the reboot. So, final resolution is awaiting the next maintenance availability for the system in question.

Printing

Meanwhile, in our own office at Chaos Central, we have been struggling with printing problems for months. Our aging Xerox color laser printer has been printing slowly or simply hanging in mid-print for some time. Canceling the job, then turning the printer off for 30 seconds to clear the queue and back on again worked, sometimes. And, when the printer went into “Power Save” mode, it wouldn’t wake up when we sent it a print job. We had recently replaced the expensive print cartridges, so the budget wouldn’t support a new printer right now. But, the symptoms seemed to indicate a driver problem more than anything else, since the problems seemed to crop up with a recent system upgrade.

Printing from Unix has always been an issue. From early on, Postscript has been the de facto common printer language, so we were faced with buying Postscript-capable printers in the 1990s. With the advent of CUPS, the Common Unix Printing System, it is possible to use almost any printer for which a CUPS PPD file is available, and to use job control features in various Postscript printers. Again, vendors defer to Windows for their proprietary print drivers, and send the Linux user elsewhere for Linux “drivers.” However, all PPD files are not created equal, and they change from time to time.

Before throwing out the old printer, we decided to try bypassing CUPS entirely. Most network printers support multiple printing protocols, one of which is FTP, the venerable File Transfer Protocol, used since the days of dial-up modems, but deprecated in the Internet age due to lack of encrypted security measures. Normally, the Network Police (i.e., the network security administrators) insist that this feature be turned off. But, with it turned on, it is a simple matter of uploading a Postscript file to the printer, using anonymous FTP. Which we did; and, voila!, it printed.

So, a search for a new printer driver ensued. We did find one at the Xerox site, with a PPD file several times larger than the generic one we were using that came with the latest CUPS package. And, the printer now wakes up from a sound sleep and prints! Yes, it is very slow yet, but it does print. The slow printing seems to be associated with different pieces of software that render the print selection to Postscript, but we can at least produce output more or less on demand.

Network

It has been some time now since we’ve had network problems, but getting connected “on the road” was a major problem until recently, as Linux drivers were not available for a lot of wireless cards bundled with laptops.  The choices for a while were to run a utility that mapped the system calls in Windows drivers to Linux systems calls, which had variable results.  Fortunately, many vendors now provide support for Linux and it is possible to buy laptops now made to run Linux.  But, as recently as 2011, using our old (ex-Windows) systems, we had a problem with the connecting layer between the driver and the I/O that reset the network connection every couple of minutes in an attempt to keep connected with the strongest signal.  In hotels or congested areas where there were multiple access points broadcasting on the same channels, this effectively precluded any kind of meaningful network experience and certainly made persistent connections (like file transfers or remote administration) impossible.  This issue seems to have been corrected in later versions of the wireless software, but there is always the next surprise waiting in the world of Linux device drivers.