Those of us who use Open Source tools for computing and eschew Microsoft products for whatever reason have long been annoyed and inconvenienced by that huge segment of the computing world that assumes that whatever they see on their computer, you will see on your computer if they send it to you. So it is, that we get inundated with Microsoft Office files: Word documents that we may or may not need to edit, Power Point presentations, and Excel spreadsheets. Some of the senders are innocents, who truly think that what they see is what there is, on everyone’s computer. Others are just plain arrogant: “If you want to do business with us, you will have the current version of Microsoft Office.”
Excuse me, but we are a Linux/Unix shop, and Microsoft Office does not run on our computers. Microsoft Office requires Windows, and a license, and is something that we not only would not use because we have a perfectly good integrated productivity environment, but, that, if we did, would require us to move files around in our network and do a lot of extra work just to accommodate your lack of sensitivity. OK, we do have a copy of Windows XP installed in a virtual machine and unused copies of Windows Vista and Windows 7 stashed “somewhere.” But, no Office, and we only use XP when we absolutely must (like testing our web creations on Internet Explorer or running TurboTax, a painful annual chore). Our normal and customary work environment, which is a network of interconnected and interoperating Linux, BSD, and Solaris machines and virtual machines, has LibreOffice, which automatically opens any attachments in our email (which we read on the web or with Thunderbird, and, when necessary, Enlightenment, but never Outlook: thank you, but no thank you), and generates all of the various types of productivity documents, in an ISO standard format, and can export them directly in PDF format for printing or viewing.
LibreOffice is the latest fork in a family of Open Source products that comply with the Open Document Format, an international standard (ISO 26300:2006) . As a convenience, LibreOffice also can read and write most documents written in older Microsoft Office formats and various other document formats that have been in common use in prior decades. But, the one that defies reasonable translation and interoperability is the Microsoft Office Open XML (OOXML) standard, which is not particularly open, nor stable, nor a widely-accepted standard (recognized by ECMA only, as ECMA-388). LibreOffice can read most OOXML documents, but so far has not been able to write them in a manner that Microsoft Office can read. When we receive a document for editing that is in OOXML format, we send it back in Microsoft Office 2003 (.DOC) format [although a much-simplified rendering, through the magic of reverse engineering–not sanctioned by Microsoft]. It is our policy that, if you require us to edit Microsoft documents in their native format, you will need to provide us with the equipment and software to do so, or we will need to purchase it and invoice you. Period. On the other hand, there are freely available converters that users of Microsoft Office can install that will export to and import from Open Document Format, to enable them to exchange files with anyone, using an international standard that does not require a license or special systems to open. But, they don’t. We have become a contentious society, where compromise, cooperation, and accommodation are difficult to non-existent.
Our contention is that OOXML files should be considered for internal use only, i.e., within an organization: anything that goes to an external organization should be in an open standard. There are also practical reasons for this:
1) Microsoft Office, though widely used in business, is not a universal standard, only runs under Microsoft Windows and Apple, and requires a separate license. OOXML is a relatively new standard, introduced in its present form in 2007 and accepted by ECMA only in 2009 (though a “standard,” it is not portable, being virtually inseparable from Windows, despite having been ported by Microsoft to Apple OS/X, and cannot be considered truly open as it has hooks into prior Microsoft products protected by patents and licensing agreements that make it impossible to replicate on other systems).
2) Microsoft Office documents frequently contain deleted information in them that may be potentially damaging to the company, and which can be easily extracted. I once received a letter that was composed by opening a different document, deleting the contents, and typing the new data. Since I did not at the time have a file converter, I dumped the unformatted contents of the file using the Unix ‘strings’ utility, which revealed not only the text intended for me, but the other, private data. Please don’t send me your company secrets and confidential client data: I don’t want it, don’t need it, and it creates ethical problems for all of us. There is one simple rule to avoid this issue:
If a document is intended to be read-only, it should be sent in an open-standard page-description format, like Adobe Page Description Format (PDF). There are so many advantages to this:
a) The document will display on my monitor and print on my printer exactly as it does on your monitor and printer. Office documents (even ones converted to a different format) will not do this, since the presentation is system-dependent. Not all of the typefaces on your system may be installed on my system, for instance. The document in an office productivity tool is formatted for the current print device, and changes (although slightly, in most cass) when a different output device is selected.
b) All of the editing and quality control markings (red-line/strike-out), notes, etc. will be stripped out of the document–only the final intended output is stored in the display format.
c) The document can be used by any system, since the standards are supported universally–the recipient does not have to have the same system and software that you do in order to read your document.
d) Any different versions of the document that turn up later will be provably derivative. It is more difficult to extract parts of the document or edit it, and there is little danger that a modified version will be passed on as if it originated with you.
One issue that came up recently showcases why OOXML is not suitable for sending documents, even ones that need to be edited. Microsoft (and others) uses a technology called OLE (Object Linking and Embedding) to enable users of their products to link to or embed live computation objects from one document into another document. This is most commonly used to place a graph or chart generated by an Excel spreadsheet inside a Word document or a Power Point presentation. The premise here is that the object is interactive–if you make changes to the spreadsheet, the changes are automatically updated in the Word document or PowerPoint slide. This is very useful if you are going to make the presentation from your own computer or print the Word Document.
However, if you send the file to someone outside your organization, any links in your document will be broken, since they do not have access to your disk drive or network share to read the linked file. And, if the object is embedded, it requires the version of the program that created it installed on the recipient’s computer in order to reproduce it, or at least something that will correctly interpret it. OLE inclusions in OOXML files have proven to be particularly difficult to correctly interpret when opened in a different system, and may simply be ignored if the file is opened in a non-Microsoft system. To date, none of the productivity suites that implement ODF are natively capable of rendering OLE inclusions in OOXML files. I was recently tasked to merge documents from several contributors into a web presentation, that I was forced to retract and then work overtime to reword and republish because one of the documents contained eight OLE charts that were totally invisible when imported into LibreOffice on my web development system.
The OLE problem predates some of the other OOXML issues by a decade or more: in the mid-1990s, when standards for documentation exchange were being formulated, the problems of the then Microsoft Office formats versus proposed open standards were very clear, and the situation has not only not improved, it has gotten worse: Microsoft continues to carry forward some legacy baggage in their conventions and formats which is and always will be incompatible with international standards. These incompatibilities date from the beginning of Windows in the 1980s and continue to accumulate.
Meanwhile, we continue to receive native OOXML files that we cannot render faithfully on our systems: Documents that contain invisible charts, PowerPoint presentations that run off the edge of the screen and have hidden elements, spreadsheets that don’t compute, and other issues that cause misunderstandings, miscommunications, and errors. It is bad for business. The solution is simple: Use standard formats for communications with others: PowerPoint presentations look better as PDFs, as do Word documents that are meant to be read. ODF is an accepted international standard for document exchange for editing: OOXML is not. It takes a few minutes to install an ODF converter plugin in a Microsoft system, and a few seconds to export in ODF format. Importing just happens, if the plugin is installed. It takes hours sometimes to examine an OOXML document to make sure there are no hidden features in it, or to have to ask for a PDF version (which some correspondents regard as an unacceptable imposition, possibly because they haven’t learned how to generate one–if it isn’t an option on the file menu, it can be with another plugin that should be standard).
In the case of the missing charts above, it was necessary to obtain a Windows-based OOXML converter, run it in WINe on Linux, and take screenshots of the charts to paste into the merged document as images, the format in which they should have been delivered in the first place, not as OLE live objects, especially since the final output was a downloadable PDF. But, reasonably, how many computer users think about the formats and nature of the objects in their documents? Most aren’t consciously aware that there is a vast difference between documents on their computer and documents on the Web. Much of the information technology today falls under Clarke’s Law* for people in other lines of work who use computers as a tool and are then frustrated when the magic spell they cast on their own computer isn’t repeatable on someone else’s. The coming Age of Cloud Computing may simplify the issues, or enormously complicate them, and may simply move the same old standards conflicts into different cloud banks.
*Science Fiction legend Arthur C. Clarke stated that: “Any sufficiently advanced technology is indistinguishable from magic.” Commonly referred to as Clarke’s Third Law.