Web Evolution

One of the conundrums of the past month at Chaos Central has been the problem of making changes to a web site for which the Unix Curmudgeon is the content editor but not the programmer. The site, a few years old, was a set of custom pages with a simple content editor that allowed the web editor to create, update, or delete some of the entries on some of the pages. The tool allowed photos to be inserted in some of the forms, and some of them were in calendar format, with input boxes for dates. The problem was updating documents in the download section, for which there was no editing form. This was becoming an acute problem because the documents in question tend to change year to year.

At first, the solution to the problem seemed to be to modify the source code to add the required functionality, which involved getting the source code from the author and permission to modify it, something we had done before to change a static list page to read from a tab delimited file.  But, the types of changes didn’t always fit with the format of the administration forms and still required sending files to the server administrator, as we didn’t have FTP or SSH access to the site.  Then, we noticed that the web hosting service had recently added WordPress to the stable of offerings.  The solution was obvious: convert the entire site to WordPress.  Of course, the convenience of fill-in-the-blank forms would be gone, but we would have the ability to create new pages, add member accounts, create limited-access content, and upload both documents and photos.

The process was fairly simple:  using the stock, standard WordPress template, the content of the current site was simply copied and pasted into new pages, and the site configured as a web site with a blog rather than the default blog with pages format.  Some editing of the content to fit with the standard WordPress theme style models, and juggling the background and header to fit with the color scheme and appearance of the old site, and incorporate the organization’s logo in the header, and it was done: the system administrator replaced the old site with the new, with appropriate redirection mapping from the old PHP URLs to the corresponding WordPress pages. This migration represented yet another step in the evolution of the web, or, more properly, in our experience with on-line content.

In the beginning, there was the concept of markup languages.  My first encounter with such was in the mid 1980s with the formatting tags in Ventura Publisher, which was the first desktop publishing tool, introduced in GEM (Graphical Environment Manager), a user interface orginally developed for CP/M, the first microcomputer operating system, preceding MS-DOS by a few years (MS-DOS evolved from a 16-bit port of the 8-bit CP/M).  Markup tags grew from the penciled editing marks used in the typewriter age, by which editors indicated changes to retype copy:  Capitalize this, underline (bold) this, start a new paragraph, etc.  In typesetting, markups indicated the composition element, i.e., chapter heading, subparagraph heading, bullet list, etc, rather than specific indent, typeface and size, etc.  In electronic documents, tags were inserted as part of the text, like <tag>this</tag>.  Where the tag delimiters needed to be installed in the text, they were described as special characters, like &gt;tag&lt; to print “<tag>” (which, if you look at the page source for this document, you will see nested several layers deep, since printing “&” requires yet another “escape,” &amp;).

One of the reasons for the rise of markup languages as plain-text tags in documents was the proliferation of software systems, all of which were incompatible, and for which the markup tags were generally binary, i.e., not human readable.  Gradually, the tags became standardized.  When the World Wide Web was conceived, an augmented subset of the newly-minted Standardized Generalized Markup Language (SGML) was christened  HyperText Markup Language (HTML).  HTML used markup tags primarily to facilitate linking different parts of text out of order, or even enable jumping to different documents.  Later, the anchor (<A>) tag  and its variants were expanded to allow insertion of images and other elements.

Of course, these early web documents were static, and editors and authors had to memorized and type dozens of different markup tags.  To make matters worse, the primary advantage of markup tags, the identification of composition elements independent of style, became subverted as browser software proliferated.  In the beginning, the interpretation of tags was controlled by the Document Type Definition (DTD), the “back end” part of the markup language concept.  The DTD is a fairly standard description of how to render the tags in a particular typesetting or display system.  Each HTML document is supposed to include a tag that identifies the DTD for the set of tags used in the rest of the document.  But, since different browsers might display a particular tag with different fonts–size, typeface, color, etc.–the HTML tags allowed style modifiers to specify how the particular element enclosed by that one tag would be displayed.  This not only invites chaos, i.e., allows every instance of the same tag to be displayed differently, but most word processors, when converting from internal format to HTML, surround every text element with the precise style that applies to that text element, making it virtually impossible to edit for style in HTML form.  To combat this proclivity toward creative styling, the Cascading Style Sheet (CSS) was invented, allowing authors and editors to globally define a specific style for a tag, or locally define styles within a cascade, by using the <DIV> and <SPAN> tags to define a block of text or subsection within a tag.

It quickly became obvious that HTML, as a static markup, was not adequate for the dynamic growth of information on the Web.  Web pages could interact with users, by transmitting snippets of programming script code (between <SCRIPT></SCRIPT> tags) bound to markup tags by modifiers to create effects like changing text color when the mouse is over the enclosed text, or preprocess form input to validate data entry before transmitting it to the server.  The most popular scripting language for this is Javascript.  Of course, in the beginning, competing browsers interpreted the code differently or even supported different dialects of the language, which caused no end of grief for web programmers.  Fortunately, the popularity and proliferation of non-Microsoft browsers in recent years has caused Microsoft to tone down that aspect of their quest for world domination, so Javascript is now more or less standardized.

In order to use the Web as an interactive tool, and an interface for applications running on the server, it was necessary to augment the HyperText Transmission Protocol (HTTP), the language that the server uses to process requests from the browser, to pass requests to internal programs on the server.  This was implemented through the Computer Gateway Interface (CGI — not to be confused with Computer Generated Imagery used in movie-making animation and special effects).  Originally, it was necessary to write all of the code to generate HTML documents from the CGI code and to parse the input from the browser. But, thanks to a Perl module, CGI.pm, written by renowned scientist Dr. Lincoln Stein, this became a lot easier and established the Perl scripting language as the de facto web programming language.

But, as the Web became ubiquitous, most content on the web was still static HTML, created by individuals using simple HTML tags in a text editor or saving their word processing documents as HTML, or using PC-based HTML editors like Homesite or Dreamweaver.  Adding interactive elements to these pages required them to be rewritten as CGI programs that emitted (programmer-eze for “printed”) the now-dynamic content.  By now, however, web servers had incorporated internal modules that could run CGI programs directly without calling the external interpreter software and incurring extra memory overhead.  By adding special HTML tags interpreted by the server, snippets of script code could be added in-line with the page content, making it much easier to convert static pages to dynamic ones.  Since the primary need was to add the ability to process form input, this led to the development of specialized server-side scripting languages, such as Rasmus Lerdorf’s PHP.

But, now that creating web pages was becoming a programming task more than a word-smithing task, there was a need for better authoring tools.  The proliferation of PHP and the spread of high-speed Internet access made it more feasible to actually put interactive user applications on web servers.  Web editing moved from the personal computer to the Web itself, as the concept of content management systems took hold.  Early forms were Wikis, where users could enter data with Yet Another Markup Standard, that would be stored on the server and displayed as HTML.  More free-form text form processors followed, making possible forums of dialogue and whole web formats for group interaction, using engines like PHP-Nuke and others, that used a database back-end to store input and PHP to render the stored content and collect new content. The expansion of the forums into tools for diarists and essayists in the form of blogs (from weB LOG) led to development of even more powerful content management systems, like Joomla and WordPress, capable of developing powerful web sites without programming.

So, we have progressed in evolution from desktop publishing to the Web, to interactive applications, to converting static sites to dynamic ones, and finally, to converting custom programs to templates for generalized site-building engines.  The Web, through new web forums for social interaction between friends and relatives who have never seen raw HTML code, allows ordinary folks to converse with friends and relatives across the world, to post photos, videos, and links to other sites of interest, just as the original hypertext designers intended.  What seemed arcane and innovative thinking 30 years ago is now just another form of natural human interaction.

But, for those of us who make our living interpreting dreams in current technology, the bar moves up again.  As we no longer think about the double newline needed at the beginning of every HTML document after the “Content-type” line and before the <HTML> tag, which is the first code emitted from a CGI program or from the server itself, we no longer need to write CSS files from scratch or PHP functions to perform common actions.  But, we need to learn the new tools and still remember how to tweak the code for those distinctive touches that separate the ordinary from the special.  And, there are still lots of sites to upgrade…