Category Archives: Linux

Living with Linux — Keeping the home fires burning (for baking Pi’s)

It’s been a while since our dour post on the folly of Congress and the personal impact of political maneuvering.  After a quick flurry to catch up projects before the end of October, which was also the end of a contract that we’ve been on continuously, through three changes in contract management, since the summer of 2001, and the last four years as an independent contractor.  We did expect a follow-on contract from a new prime contractor: this is the way many government service contracts go–the management changes, but the workers get picked up by the new company, so there is continuity of service, and a bit of job security, despite lack of “brand loyalty.”  During the 1990s, I seemed to change jobs about every 18 months on average, moving to a new client and a different employer, so keeping the same client is a bonus.

Unfortunately, the chaos resulting from the shutdown meant that the contract transition did not go smoothly.  We departed on a scheduled vacation trip to Kauai with nothing settled, but the first time since 1989 that we’ve gone on vacation without the prospect of work interrupting. Nevertheless, midway through the week, the contract negotiations got underway in earnest, complete with five-hour timezone shift: when we returned, there was a flurry of activity and then, back to the  grind, continuing on the ongoing projects that were interrupted by both the shutdown and the contract turnover.

While we were gone, the usual Pacific Northwest November storms came early, knocking out power to our network, so there was much to do to get things running again.  Several machines needed a rather lengthy disk maintenance check, and the backup system was full, as usual.  So, we took advantage of 1) the possibiliity of future earnings due to contract renewal and 2) delays in getting the paperwork actually signed and work started, to do some system maintenance and planning, starting with acquiring a bigger backup disk.

Secondly, our office Linux workstation had had a bad update, trying to install some experimental software from a questionable repository, to the extent that the video driver crashed and could not be restored.  Upgrading the system to a newer distribution didn’t help, as upgrades depend on a working configuration.  Now, we’ve been using Ubuntu as our primary desktop workstation environment since 2007, through several major upgrades, one of the longest tenures of distros ever (though we did use Solaris for a decade or more, along with various Linux versions).  Ubuntu has one of the best repositories of useful software and easy updates and add-ons of a lot of things we use from day to day.  But, in the last couple of years, the shift has been to a simpler interface and targeting an audience of mostly web-surfers who use computers for entertainment and communication, but little else.  Consequently, the development and productivity support has suffered.  The new interfaces on personal workstations, like Ubuntu’s Unity and Microsoft’s latest fiasco, Windows 8, have turned desktop computers and laptops into giant smart phones, without the phone (unless you have Skype installed).  One of the other items to suffer was the ability to build system install disks from a DVD download.

So, after failing to restore the system with a newer version of Ubuntu, on which it is getting more and more difficult to configure the older, busy Gnome desktop model that we’ve been used to using for the last decade or more, we decided to reinstall from scratch.  Ubuntu also somehow lost the ability to reliably create install disks, as we tried several times to create a bootable CD, DVD, or memory stick, to no avail.  So, since we primarily use Red Hat or its free cousin, CentOS, as the basis for the workhorse science servers at work and to drive our own virtual machine host, I installed CentOS version 6 on the workstation.  All is well, except CentOS (the Community ENTerprise Operating System) is really intended to be a server or engineering workstation, so it has been a slow process of installing the productivity software to do image editing for photos and movies and build up the other programming tools that are not quite so common, including a raft of custom modules.  Since Red Hat and its spin-off evolve a bit more slowly than the six-month update cycle for Ubuntu, there has been some version regression and some things we’re used to using daily aren’t well-supported anymore as we get closer to the next major release from Red Hat.  Since restoring all my data from backup took most of a day and night, and adding software on the fly as needed has been tedious, I’m a bit reluctant to go back. Besides, I need to integrate a physical desktop system with the cluster of virtual machines I’m building on our big server for an upcoming project, so there we are.

The main development/travel machine, a quad-core laptop with a powerful GPU and lots of RAM, is still running Ubuntu 12.04 (with Gnome grafted on as the desktop manager), but has had its own issues with overheating. So, this morning I opened it up for a general checkup.  Everything seemed to be working in the fan department, but I did get a lot of dust out of the radiator on the liquid cooling system, and the machine has been running a lot cooler today.

Because of the power outage, and promises of more to come as the winter progresses, we’ve been looking at a more robust solution to our network services and incoming gateway: up until now, we’ve been using old desktop machines retired from workstation status and revamping them as firewalls and network information servers, which does extend their useful life, but at the expense of being power hungry and a bit unstable.  But, the proliferation of tiny hobby computers has made the prospect of low-power appliances very doable.  So, we are now in the process of configuring a clutch of Raspberry Pi computers, about the size of a deck of playing cards, into the core network services.  These can run for days on the type of battery packs that keep the big servers up for 10 minutes to give them time to shut down, and, if they do lose power, they are “instant on” when the power comes back.  And, they run either Linux or FreeBSD, so the transition is relatively painless.  The new backup disk is running fine, and the old one will soon be re-purposed for archiving data or holding system images for building virtual machines, or extending the backups even further.

So it goes: the system remains a work in progress, but there does finally seem to be progress.  We even have caught up enough to do some actual billable work, following three really lean months of travel and contract lapses.

Sleuthing the Wiley Thermals

Yesterday, we were hit with a thermal shutdown on the big laptop.  Installing psensor and the coretemp module helped get a handle on the issue, which centered on the Nvidia GeForce 540M GPU.  Hardware drivers have always been an issue for Linux, since the Open Source software model conflicts with the need for peripheral vendors to keep the internals of their hardware secret, which they do so by not releasing the source code for the software that links the hardware with the operating system.  That’s fine for a closed, proprietary system like Microsoft Windows, the primary market.  As Linux users, not in the business of redistributing systems, we would be happy with an add-on driver that works.  But, since Linux is a small portion of the market, there is little incentive for hardware vendors to write Linux-specific driver software.  And, the software that is available is not always optimized for the Linux kernel, with the result that it either is buggy or skimps on reliability features.

Nvidia has, in the past year, incurred the ire of Linux founder Linus Torvalds for just these issues.  The Ubuntu Linux distribution that we run on our systems comes with a more or less generic Nvidia driver.  While users can download an updated driver from Nvidia, installation is a bit daunting, requiring the system to be reconfigured for text-mode login in order to rebuild the X11 graphics links.  Those of us who have been around long enough to remember hand-tuning the X-Window system configuration files, fingers poised over the “kill” keyboard sequence, ready to shut down X11 in an instant to avoid burning the monitor if the settings were wrong, have grave misgivings about tinkering with the graphics.  Plus, the forums on the ‘Net seemed to show that users were having overheating problems no matter what combinations of driver and distribution versions were used.  Custom configurations seem to be less than desirable for production machines, so we elected to look for another solution.

A bit more searching on the ‘Net came across the fwts (FirmWare Test Suite) utility package.  Once installed, it ran, pointing out compatibility issues between the computer BIOS and the kernel/driver configuration.  One of the automatic corrective actions performed by fwts was to switch the operating mode from “performance” to “normal,” which immediately lowered the operating temperature of all the components.  The GPU still shows a temperature increase under load, but the fan hardly runs anymore, whereas earlier in the week it was running on high most of the time.

The take-away message here is, updating kernels and/or drivers can and will sometimes result in conflicts with your hardware.  Linux has come a long way toward a plug-and-play, run “out of the box.” installation, but it still pays to test and evaluate hardware configurations, just like the old days of Unix.  Actually, in the “bad old days” of a few commercial Unix systems, the range of hardware combinations was often very limited, so compatibility issues had been carefully tuned out by the system vendor.  But, those systems were expensive.  In the Open Source world of Linux, where the system is expected to run on any combination of hardware on the commodity PC market, some outliers are to be expected.  For the average desktop Linux user, converting an old Windows machine to Linux will work just fine.  But, for “power users” and server applications, some engineering and testing may be required for optimum performance.  Certainly, the psensor and fwts software will be an important part of the Linux toolkit from now on.

Upgrade Woes: Changing the Way We’ve Always Done Things

A while back, a wave of learning curve frustration swept through Chaos Central.  One of the tenets of the computing life is that the march of time brings change at a mind-boggling rate.  Most Linux distributions have settled on a six-month update cycle, with almost daily patch-level updates in between.  The patch-level updates go mostly unnoticed to the average user, but sometimes quirks are induced.  But, the major distribution updates bring changes to the desktop decor ranging from the equivalent of new curtains and furniture slipcovers to knocking down walls and repainting.

Since we use our computers for productive work, we tend to keep the furniture and paint to basic office cubicle mode.  Since about 2007, we’ve tended toward running Ubuntu as the primary desktop systems.  Unless there is some very good reason, we also tend to use the Long-Term Support versions.  However, our last new computers (December 2011) arrived with Ubuntu 11.10, which was OK even though it defaulted to Unity, since the previous LTS was 10.04, which had some deficiencies in the WiFi area.  Being resistant to change, we quickly reverted to the Gnome desktop: the new but not necessarily improved Unity desktop seeming to be a dumbing-down of the desktop, and getting in the way of the usual cluttered, multi-tasking way of doing business (we don’t call our office Chaos Central for no reason).

Of course, in the spring of 2013, we suddenly were faced with the End-of-Life clock on Ubuntu 11.10.  The obvious choice, then, is to upgrade to the 12.04LTS, instead of the newly-release 13.04 version.  The default on reboot is the Unity desktop, though Ubuntu now does provide Gnome (Ubuntu Classic) as a choice.  Having recently acquired an Android phone, our first “smart phone,” we decided to give Unity a try–for a while.

OK, it isn’t so bad, once you get used to the idea that you can have multiple screens, and clicking on the launcher icon of a running process switches to it (unless there are multiple copies, but we also discovered how to display everything, a la OS/X).  However, we recently discovered the dark side to Unity: stability.

I had noticed that the fan seemed to be running on high on my main laptop, a quad-core machine with Nvidia GPU from Zareason, one of the few Linux-only system vendors.  Then, while watching a 30-minute video in full-screen, the machine suddenly powered down, due to overtemperature.  This has only happened before when I inadvertently blocked the air intakes.  Not so, this time.  It was blowing very hot air.

A bit of research into system temperature monitors led me to the psensor package, along with the coretemp kernel module and supporting software.  Yow!  The graph showed the primary culprit to be the Nvidia card, which spiked between 80 and 90C whenever a new graphics window was opened.  More research showed the most likely cause to be Unity.  Reverting to Gnome helped reduce the overall temperature, but the spikes are still there.  One of the issues here is conflicts between Ubuntu and the Nvidia native drivers.  We did have the Nvidia drivers installed until the 12.04 upgrade, but, since they don’t support the 3-D extensions in Unity, we left the Ubuntu drivers in place.

All of this brings to the fore the fact that, despite the attempt to make computers (all of them, including Apple, Microsoft, and Linux) look and feel like a big smart phone, the power desktop is not user-friendly.  The sealed-panel model employed by Apple and Microsoft means you have to live with what came out of the box, but Linux, with the open-source model, means that you can (and, by inference, must) tinker under the hood.  The upside is, that, with diligence and perseverance, you can fix it and decorate it to suit your own tastes and workstyle.

 

Drivers Wanted — Call 1-888-GOT-UNIX*

*not a real phone number

The English language has been characterized as one that not so much borrows from other languages, but outright steals, remakes, and repurposes words to suit one or many purposes. So, the “Drivers Wanted” signs on the back of 18-wheeler trailers has different meaning to Unix system administrators. But, just as the long-haul trucker operates the tractor that gets your load hauled to where it is needed, a Unix device driver is a piece of software that gets your data to and from an external device, like a hard disk, dvd, printer.

Recently, Chaos Central experienced several close encounters of the frustrating kind with device drivers. Unix device drivers have always been a challenge, but the growth of open source software has made them even more so. Device drivers have traditionally been provided by the manufacturers of computer equipment, either through iron-clad non-compete agreements with commerical Unix vendors (like Oracle and Apple) for source code or operational specification for the device or in binary form in the case of Windows computers. The problem arises with Linux: in order to be distributed with the GNU open-source licensing, all components must be open source and freely redistributable, and many device drivers are not.  At best, drivers come as object files and header files that can be linked with the kernel version at the current patch level.

Device manufacturers develop and deliver driver software in binary form for Microsoft Windows, because they sell most of their products for use with computers running Windows. However, since the hardware is proprietary and the software required to interface with it reveals much of the inner workings and design, vendors are reluctant to release the source code to Linux developers. Also, because the Linux market is relatively small, they are also reluctant to produce binary versions compatible with Linux. In cases where there are binary drivers for Linux, the drivers are specific to kernel revision. Since the Linux kernel is frequently patched and has at least a minor revision release every six months, keeping a complete configuration inventory is problematic, even for systems that are supported, which brings us to the first pothole in the information highway…  And then, there is the problem of keeping drivers current with firmware changes in the hardware.  The devices themselves are often programmable circuitry, with flash memory.  Changes to firmware usually also require changes in the host driver.

Fiber Optic/Multipath Disk Drivers

It is no secret by now that the majority of Internet servers and almost all high-performance computing systems run Linux. High-end server hardware is most often attached to high-end storage systems as well, most of which achieve high throughput by using fiber optic connections. So, most fiber channel Host Bus Adapters do have device driver support for Linux, either available directly from the vendor or, in the case of supported Linux distributions, as part of a paid service package from the Linux provider. But, these choices also present what is basically a fork in the configuration management of the drivers: for instance, the Red Hat version does not always match the QLogic version. The reason for this is complex, but usually the hardware vendor wants to keep the drivers and firmware as up-to-date as possible to correct problems and support newer platforms and storage devices. The operating system vendor wants to keep older systems running smoothly, as well as supporting newer systems. Both need to balance the cost of maintenance against the number of installed systems. Inevitably, combinations of hardware and software exist that are not supportable, or that require some additional work by the system administrator to maintain a working system.

Thus, it came to pass that, during a routine patch-level upgrade to bring a pair of systems to the same Red Hat 5.8 level, a warning popped out on the console, announcing components of the fiber-optic driver were missing. Fortunately, Linux systems are built in such a way that the system software libraries maintain compatibility with all patch levels of a particular kernel release: Red Hat does not update the kernel release through a major distribution release, so it is possible to update the system software without updating the kernel patch level, so that the drivers do not need to be rebuilt. It was a simple matter of editing the grub.conf file to reboot using the old kernel version (which is why Linux retains the older kernels in the Grub menu). Even though most device drivers are now loadable at run time, if they are needed during the boot process, they must be integrated into the boot image. In the case of these machines, the boot disk appears to be local, but the driver is needed to mount the data drives for the server application. Being able to build a boot image with the driver is usually a pretty good indication that the driver will load when needed.  Also, many high-end systems use multiple paths between the host bus adapter and the storage array, for both fail-over and to increase throughput.  Multipathing needs to know what devices support it, so even if the system will boot without the device driver, multipathing may not work.  In most Linux systems, the device mapping will differ between multipath and non-multipath configurations, so the correct devices will not be available if the drivers are loaded in the wrong order.

But, it is desirable to have the kernel patches installed as well, so the problem of building a boot image with the driver installed had to be solved. There was a utility package for rebuilding the current image, but that presupposes that the system would actually reboot without the driver installed. A bit more research turned up a manual procedure for building a boot image from any kernel, which appeared to work. To complicate issues, this upgrade was for a remote client, so standing at the console to select a fall-back kernel in case the reboot was unsuccessful was not an option. In these cases, it is necessary to have someone standing by at the physical machine to intervene if necessary during the reboot. So, final resolution is awaiting the next maintenance availability for the system in question.

Printing

Meanwhile, in our own office at Chaos Central, we have been struggling with printing problems for months. Our aging Xerox color laser printer has been printing slowly or simply hanging in mid-print for some time. Canceling the job, then turning the printer off for 30 seconds to clear the queue and back on again worked, sometimes. And, when the printer went into “Power Save” mode, it wouldn’t wake up when we sent it a print job. We had recently replaced the expensive print cartridges, so the budget wouldn’t support a new printer right now. But, the symptoms seemed to indicate a driver problem more than anything else, since the problems seemed to crop up with a recent system upgrade.

Printing from Unix has always been an issue. From early on, Postscript has been the de facto common printer language, so we were faced with buying Postscript-capable printers in the 1990s. With the advent of CUPS, the Common Unix Printing System, it is possible to use almost any printer for which a CUPS PPD file is available, and to use job control features in various Postscript printers. Again, vendors defer to Windows for their proprietary print drivers, and send the Linux user elsewhere for Linux “drivers.” However, all PPD files are not created equal, and they change from time to time.

Before throwing out the old printer, we decided to try bypassing CUPS entirely. Most network printers support multiple printing protocols, one of which is FTP, the venerable File Transfer Protocol, used since the days of dial-up modems, but deprecated in the Internet age due to lack of encrypted security measures. Normally, the Network Police (i.e., the network security administrators) insist that this feature be turned off. But, with it turned on, it is a simple matter of uploading a Postscript file to the printer, using anonymous FTP. Which we did; and, voila!, it printed.

So, a search for a new printer driver ensued. We did find one at the Xerox site, with a PPD file several times larger than the generic one we were using that came with the latest CUPS package. And, the printer now wakes up from a sound sleep and prints! Yes, it is very slow yet, but it does print. The slow printing seems to be associated with different pieces of software that render the print selection to Postscript, but we can at least produce output more or less on demand.

Network

It has been some time now since we’ve had network problems, but getting connected “on the road” was a major problem until recently, as Linux drivers were not available for a lot of wireless cards bundled with laptops.  The choices for a while were to run a utility that mapped the system calls in Windows drivers to Linux systems calls, which had variable results.  Fortunately, many vendors now provide support for Linux and it is possible to buy laptops now made to run Linux.  But, as recently as 2011, using our old (ex-Windows) systems, we had a problem with the connecting layer between the driver and the I/O that reset the network connection every couple of minutes in an attempt to keep connected with the strongest signal.  In hotels or congested areas where there were multiple access points broadcasting on the same channels, this effectively precluded any kind of meaningful network experience and certainly made persistent connections (like file transfers or remote administration) impossible.  This issue seems to have been corrected in later versions of the wireless software, but there is always the next surprise waiting in the world of Linux device drivers.

Virtues of Virtualization

Well, it finally happened.  We keep a secure login to our home network open just in case we need files that aren’t on the laptop, or, to use as a relay point when locked behind a customer’s firewall, or, well, because we can.  While on travel, our gateway machine at home ceased responding.  Oh-oh.  When we got home, our fears were confirmed: the power supply had failed.  It just so happens that the gateway machine hosted some virtual machines that we need for vital business, so we had to get it recovered quickly.  It’s not a good plan to have valuable data on your gateway machine: that also has been corrected–we moved the gateway to another box.

In most of these cases, we might be faced with installing the software (which, unfortunately, was a Microsoft Windows application, so there are licensing and compatibility issues) on another machine, then restoring the data from backup, and so on.  Furthermore, we discovered that, while the user files and configurations were backed up, the volume containing the virtual machine images (along with the data) was not.  Oops, again. Although we do manually backup the data from time to time, it was a bit overdue.  But, the disk was still good, so we popped the hard drive out of the failed unit, opened up another machine, plugged it in, and transferred the virtual machine image and configuration files to the second machine.

Viola! in the time it took to copy a 20GB file (we build virtual machines with as small hard drives as necessary, as they are usually special-purpose machines anyway), we had recovered the application and the data.  However, it wasn’t convenient to run it from the new host, so a few days later, we made room on another system and transferred the virtual machine image once again, this time across the network.  Of course, we immediately included the virtual disk images in the backup scheme, and changed our procedures so we shut down the system when not in use so we get a clean backup image.

One of the reasons we hadn’t been backing up the virtual disk image is that, when the system is running, the image is inconsistent and might not be recoverable anyway, unless we can snapshot it (i.e., capture a static image that can be backed up and recovered). With most systems, we simply run a backup client (which for rsnapshot, is simply the SSH daemon that is usually on anyway) on the virtual machine.  But, we don’t usually run an SSH service on Windows, so a different backup system needs to be implemented for Windows.  There are a number of operating-system-agnostic backup software systems available, even several open-source, but they aren’t as convenient as what we use.  However, losing valuable data is extremely inconvenient, so we need a different approach.

On our portable systems, we use Oracle’s VirtualBox to run our virtual machines. VirtualBox is intended for desktop use: the VM is the property of a single user and can be easily migrated from system to system or hosted on networked storage and launched from one of several different workstations.  The most frequent use in this case is to virtualize Microsoft Windows systems for running those few applications for which we do not have an equivalent Linux application or which will not run under the WINdows Emulator (WINE).  For training, we often use VMWare appliances, which are also easy to install and migrate.  Within our network, the network services–such as DNS, web servers, and file servers–and development for multiple Linux, BSD, and Solaris distributions are virtualized on Citrix Xenserver, running on a server-class machine dedicated to hosting virtual machines.

Recently, we attended a seminar on Ganeti, which is Google’s answer to keeping virtual machines running all the time.  We are thinking of migrating our systems to Ganeti, a cluster management system for virtual machines that keeps mirrored copies of virtual machines on multiple servers in a cluster, so that the VM is always available, even if any single node fails.  And, if hosted on three or more machines, any two nodes can fail without loss of data or incurring downtime.  This will solve the issue of backing up non-Unix VMs and the several hours of downtime needed to restore a backup to another system.

Virtualization is the future of computing, where we depend on having all our data available all the time, or need multiple systems but have desktop space for only one.  There are some performance and technical issues, such as enabling audio and accessing optical and flash drives, but using local virtualization like VirtualBox and VMware appliances on a workstation helps solve that problem, as long as the VMs get regular snapshots or are shut down during system backup times.