IEEE TRANSACTIONS ON PROFESSIONAL COMMUNICATION, VOL. 40, NO. 2, JUNE 1997
0361-1434/97$10.00 © 1997 IEEE

Jonathan Price, Editor

Manuscript received November 1996; revised February 1997.
The author is with Lightbinders, Inc., San Francisco, CA 94107 USA.
IEEE PII: S0361-1434(97)04360-9.

Using SGML to Create Complex Interactive Documents for Electronic Publishing

Commentary

---PETER GOLDIE


Abstract---In creating complex interactive documents, some technical communicators use software products that emphasize format and style in displaying pages. This approach limits the communicator's ability to repackage the information presented in electronic versions and increase its interactive use, which is a key benefit of the structure-based approach offered by using Standard Generalized Markup Language (SGML). In a number of projects that render mathematical, scientific, and engineering texts electronically, using SGML allows the technical communicator to make equations interactive and to automate links to references. The author sketches out problems associated with page description approaches to displaying electronic pages and discusses the comparative benefits of SGML.


Index Terms---Electronic publishing, complex documents, SGML, page description.




As one encounters new media products, it is unusual when a product is NOT advertised as interactive. Interactive? How could anything used by humans not be interactive? Books are interactive, TV is interactive, radio is interactive, as their designs all require a human interaction. Do the publishers really mean non-linear? Do they mean integrated with multimedia? Do they mean the contents can be modified?

Complex, on the other hand, is a word we rarely see in any new media product advertisement. The word complex is a kiss of death when describing any computer product intended for the general public. In the "new world" of the Information Age, everything is suppose to be easy to use and understand, especially if in the "old world" it was complex.

Let us begin with defining the terms complex document and interactive document in the context of electronic publishing:

A complex document is a highly evolved means of textual communication, coming in a wide variety of formats agreed upon by convention and guided by numerous style manuals. A complex document has a hierarchical structure that results from methodical analysis of the contents, as applied through the conventions of presentation style.

Inherent in any complex document structure is navigation, or knowing where you, as the reader, are in relation to the total contents. In the simplest of navigation schemes, pages are numbered; in more complex documents, tables of contents (TOCs) outline the structure of frontmatter, chapters, subchapters, and backmatter. The large structural features of a complex document are further subdivided into sections, subsections, section titles, sublevel headers, paragraphs, lists, tables, figures, appendices, indices, etc. Many major scholastic fields claims unique structural features are necessary to properly communicate their specific type of information.

Interactive documents have some or all of the following characteristics:

Document Structure vs. Style

The majority of electronic published products fall far short of being able to deliver complicated documents in all of their richness, let alone documents both complex and interactive. The reason behind this limitation is directly related to the evolution of word processing on personal computers (PCs, be they Mac or "IBM"). The technology of word processors arose as replacement for typewriters (specifically, IMHO, The IBM Selectric II), with the design goals of achieving a hardcopy output of 'manuscript quality'. At the same time, printers had already adopted very complex, very expensive, very proprietary computerized typographic composition systems. As PC hardware and software capabilities grew, the desktop publishing 'industry' arose, first presenting itself as a naïve challenge to printing. Ironically, while never a real challenge to large press operations, DTP provided de facto standards upon which designers and production editors using PCs could create a formatted page that printers could use to produce the final hardcopy results.

When a complex document is delivered as print, the style alone indicates the complex nature of the document structure. Style conventions such as typeface, font, point size, bold/italic/underscore emphasis, layout, tabs, indents and rules give the reader the visual clues necessary to convey the document structure. Style is in fact a secondary characteristic of the underlying document structure. Because style conventions are traditionally predicated upon the layout of a printed page, electronic publication systems which rely upon page descriptive languages are designed to communicate structure through presentation style, giving page layout priority over document structure. The most common example of this approach to rendering electronic documents is Adobe Postscript Language, and it's electronic publishing descendant, Adobe PDF/Acrobat.

[Acrobat ScreenShot]


Figure 1. An IEEE journal article rendered through Adobe Acrobat, using a page descriptive language, Postscript/PDF. Presentation and navigation is inherently limited by the layout of the printed page.

The underlying, and unfixable, problem with Postscript/PDF is that it is a page descriptive language, not a document structure descriptive language. Style and page layout appearance have taken precedence over document structure. Any complicated document can be rendered in Postscript, but the structure must first be interpreted by a human operator into the appropriate style, thus communicating the underlying structure through presentation style. Unfortunately, once a complex document has been prepared as a Postscript/PDF file, there is no reliable, accurate, and/or automated process of applying a logical structure back to the document, despite some rather ingenious attempts [1].

If document could contain structural information encoded to be machine readable, style could then be applied automatically. If an author writes his manuscript within a structured framework, the process of organizing, editing, revising, indexing/referencing, and, of course, hardcopy generation, can be greatly simplified. This is the fundamental reasoning behind Standard Generalized Markup Language (SGML) markup of complex documents.

Many other benefits also come from structured document encoding. Although SGML originally began as a proprietary IBM format called GML, its authors recognized its widespread potential and proposed the development of an international, platform-independent, open, "standardized" GML. This it became, as the ISO-8879 standard. (For more of the history behind SGML, and other online SGML resources, several excellent web sites provide considerable information [2-6]). An open text encoding standard allows publishers to set common document structure conventions and lets any software developer create applications that can apply the standard. Repackaging contents into new titles is simplified because source materials are organized by document structure, not by page. Archiving the contents is aided by platform-independence and a delimited data structure readily adaptable to fielded databases. Rendering SGML structure becomes a straightforward matter of assigning style to a structure element. Changing document appearance (style) becomes simple and instantaneous; one simply applies a new set of styles (stylesheet) to the underlying structure.

SGML is a language for describing the structure of types of documents and the text encoding to be used in creating content consistant with those document types. Any individual, publisher or industry is free to create their own document structures, called document type definitions (DTDs), and these can be either public domain or proprietary. Thus, in SGML publishers are given a logical, open, extensible, customizable, interconvertable, indexable and archival text encoding system. Certain federal government agencies, especially those which must exchange large volumes of technical documentation with industry (i.e. military, transportation departments), have mandated SGML formatting. Publishers, for the most part, have all endorsed SGML in concept, but only a relative few have committed their production to SGML. The reason for slow adoption of SGML is economic and technical. More on this later...

Why We Chose Structured Document Encoding

In 1988, I created Lightbinders to produce CD-ROM versions of basic research publications for both non-profit professional societies and commercial STM (science, technology & medical) publishers. The initial technical demands we were required to solve including the following:

These minimal requests were difficult to fulfill in the late '80s-early '90s, especially considering the level of installed hardware. The hardware, operating systems, and user knowledge rapidly changed and as the readers of the electronic versions became more sophisticated, they demanded:

It was clear we needed a better cross-platform delivery system, and one that would use an underlying encoding system that would survive migration of the contents into the next generation of software. This we found in the DynaText Publishing System created by Electronic Book Technologies (now a division of INSO Corporation [7]). At the time, DynaText was the only software interface using native SGML encoding that met the technical demands of the scholarly publishers we work for.

Even though we were a small organization, the prospect of training the staff accustomed to flat ASCII to code and edit SGML appeared to be a huge challenge. There were almost no editing tools available, SGML consultants were expensive and lived in distant cities, and DTDs appeared incomprehensible in their brevity. Fortunately, looks were deceiving.

SGML tagging proved to be entirely logical and easy to learn. All document contents are contained within elements or tags, these tags symbolized as simple abbreviations inside corner brackets. Paragraphs are nested within subsections, subsections within sections, sections within chapters, etc. Whereas a typesetting composer needed to take a structural element, such as article title, and apply many aspects of style to it (font, point size, color, emphasis, positioning, etc.) an SGML editor simply has to code <title>Article Title....</>, and confirm the title element is located corrected within the larger structure of the document (i.e. inside the <fm>frontmatter</>). Separation of style from structure actually improved our editing productivity.

[Darwin 2nd Ed ScreenShot]


Figure 2. A simple SGML application, as viewed through the DynaText interface. The multilevel table of contents is automatically created from the SGML document structure, allowing dynamic navigation to the text. Scalable text automatically wraps to any window size; alternative presentation styles (fonts, colors, layout, etc.) can be selected by the user. Hyperlinks and hypertext embedded within the SGML source are indicated by colors and icons.


One of the software tools which has enabled DynaText to be the premier SGML browser is the stylesheet editor, called Insted. This software application presents the contents organized within the document structural hierachy, and allows style to be individually applied to each element type. The style of each element is described by a set of style properties, which can be set by property value functions. The variable properties are such things as font type, font weight, font slant, font-size, color, horizontal and vertical spacing, and preceding and following text for each instance of a given element. Property value functions can be used to vary the values of properties programatically, based on operating system information, mathematical operations, string operations, and information about the contents, attributes, and hierarchical position of elements within the document. A set of these style definitions is referred to as a stylesheet. Insted applys property value functions to properties interactively.

By using Insted you can easily tune your stylesheets while judging their effect on screen, functionally somewhat similar to formatting onscreen with advanced word processors. Stylesheet editing produces the computer display appearance, tables of contents are automatically extracted, and hardcopy options are created. Stylesheets can also be taught to monitor their environment, to automatically select the correct and best font for each operating system, to scale the type correctly for different monitor resolutions, and to automatically render illustrations, and tables to fit the window dimensions.

Stylesheet editing through Insted, in addition to styling, gives the content developer control of *actions* related to an element. Through a simple script, one can automatically generate page numbers, chapter numbers, either as digits, letters, Roman numerals or combinations thereof. Illustration tags are scripted to instruct the DynaText browser to open the graphic files. Elements for external functions can be embedded anywhere in a document, and through scripting can launch external programs. We created element tags such as <video>A Video Clip</>, which, when clicked on, play a video segment. Any tagged element can be directed to evoke an action.

In our work for outside publishers, we are often directed to use their "standard" DTD. Ideally, their DTD has been created with consideration for all document structure and functionality found within the content they wish to publish. It is not uncommon for us to encounter limitations in their DTD, and we are usually able to quickly modify the DTD to include the new element defintion. In addition, DynaText is somewhat tolerant of violations to strict SGML structure, and this we occasionally use to our advantage. When a needed tag did not exist in the DTD we were directed to use, we can often simply use a new tag name without modifying the DTD.

SGML and Mathematics

EBT has wisely incorporated a public domain version of TeX (emTeX) into their publishing system, as a means of rendering mathematics. SGML still lacks a universally agreed upon method of encoding mathematics [8], whereas TeX on its own (actually, thanks to the huge efforts of Donald Knuth [9]) has become a de facto standard for rendering math notations. The ability to integrate TeX, which is a programming language that has no direct relationship with SGML, provides a clear demonstration of the versatility and extensibility of SGML. TeX can be contained inside an element we called <tmath>, as in this example: <tmath>$E=mc^{2}$</>. During the process of compiling the SGML into a book, we direct any raw TeX within <tmath> to be run through the emTeX compiler, resulting in a renderable DVI (device independent) object. The DVI object is embedded in the compiled SGML, and though an Insted program script, we direct embedded <tmath> DVIs to be viewed at rendered DVIs. The emTeX DVI calls Computer Modern math fonts (the fonts are part of the DynaText browser installation). The result is dynamic rendering of any math formula we have encountered, and has enabled us to produce the CD-ROM version of the classic math text Table of Integrals, Series, and Products, by Gradshteyn and Ryzhik (Academic Press; ISBN 0-12-294756-8).

This is not the limit of how we make mathematics more interactive though SGML and TeX. Frequently the TeX source would be valuable to readers who write in LaTeX, AMS-TeX or one of the other TeXs. We duplicate the source in a nearby element called <rawtex>$a^2 + b^2 = c^2$<\>, and suppress default display of <rawtex> behind an icon or through a stylesheet element hide command. When revealed by the reader, the source TeX can be cut and pasted into his TeX application, saving considerable time rewriting the source notation.

We also use TeX displays to launch from the SGML content into external interactive mathematics programs, such as Mathematica, MathLab, MathCad, or Maple. Here, the rendered TeX math element is scripted to represent a hot spot; clicking on the math formula would start the math program, pass the formula to be interpreted, and await input of new variables from the reader. This we demonstrated in a handbook of acoustical research; a TeX formula illustrated the basis for calculating sound frequency of a vibrating object, clicking on the formula launched the external program, allowed the user to change the physical shape of object and automatically regenerate the new frequency... and hear the new frequency as well!

[Mathematica ScreenShot]


Figure 3. A math formula in DynaText is linked to launch a Mathematica notebook by clicking on the red arrow icon. The Mathematica notebook application can allow the reader to interactively alter the formula parameters and view the results.

SGML and Hyperlinks

Although the use of SGML provides a means to represent the structure of a complex document, and textual documents tend to be read in a linear manner, SGML can easily be adapted for non-linear or alternative pathways through contents. In fact, traditional documents already employ some non-linear aspects, which as footnotes and reference citations. SGML enables tagging of these cross-reference points, and program scripting directs the hyperlinking of the spacially separated sections. Clicking on a reference citation instantly brings up a citation window from the bibliography. This does not mean every reference must have the citation embedded at that spot (hypertext), just that a simple but unique code is added to associate the reference with its citation. Here is an example of the simplicity of hyperlink coding in SGML:

reference coding:

... and these results were confirmed by <ref id=R1234>Smith et al., 1984</>. In 1993, we..

citation coding:

<bib id=R1234>Smith, A.R., Jones, R.T., and Royce, P.M. The Evolution of Diacritical Thought. 1984, W.H. Nueroth & Sons.</>

Through simple programming scripts we associate with the element <ref> and <bib> tags, and the DynaText application can match the unique ID codes and reveal the linked text within a new window.

The addition of hyperlinks has become so routine in our SGML production that we routinely link all bibliographic references, as well as author, footnotes, figure, table, sidebar, appendix references. Because of the structured nature of SGML, the task of locating link sources and targets can be automated. This is because the target of a link is a defined element, to the exclusion of all other elements. The hierarchical structure allows you to unambiguously address elements within the hierarchy such as, the parent's left sibling. This allows for such things as restricted scrolled popup views. One CD-ROM we produced (Methods in Enzymology Index CD-ROM 1955-1994, Academic Press, ISBN 0-12-000101-2) contains over 320,000 hyperlinks, proving the process of hyperlink coding can be performed economically and frequently can be automated. Embedding of such high numbers of hyperlinks in Acrobat titles has not been found to be practical.

More elaborate examples of non-linear pathways through SGML can be achieved through stylesheet editing. Sections of contents can be exposed or hidden selectively, giving the impression to a reader of re-ordered contents. Finally, a feature of the DynaText browser lets the reader record his own movement through the contents, forward, backward, or across different titles. The individual path the reader has followed can be saved, edited, and played back. This is an excellent method for teachers to prepare course lessons from within a large textbook. This feature depends upon the underlying SGML structure of the document to navigate; as the software interprets each structural element. This precision is not possible using a page descriptive language, where the structural unit is an arbitrary page. Adobe Acrobat is only able to achieve a certain level of navigation by creating text-based indicies in parallel to the PDF representations of each page. The added cost of integrating searchable and navigable text to PDF files is significant.

The ultimate in complex document interactivity comes when the contents can actually be changed by the "reader" (obviously, the definition of "reader" would now encompass "writer/editor"). When a document is structured, and presentation of the contents is dynamically rendered each time it is displayed, any changes made will automatically be seen. In a page descriptive language, each page is dependent upon the prior page. Addition or deletion of a paragraph on page 1 would require all subsequent pages to be recreated. The distillation process of Postscript to PDF produces output pages that are fixed and unalterable. The latest release of DynaText ver. 3.0.1 is capable of rendering raw SGML, thus allowing a "reader" to change raw contents and immediately see the results.

SGML's Missing Links

All of the benefits of SGML are dependent upon SGML-aware software tools, tools for authoring, editing, styling, indexing, presentation, archiving. Development of these tools has been slow, and development of applications for the two most critical junctures of the publication process, authoring and printing, have been slowest of all. Only two significant word processors have been created using SGML as their primary format, SoftQuad Author/Editor and ArborText Adept Editor. Both of these applications cost many times what common word processing applications cost. There are add-on SGML modules to the popular word processors Microsoft Word and Novell WordPerfect, but these add-on modules must function within constraints of the parent application and its proprietary format, and thus have limitations.

Without an inexpensive SGML authoring word processor, publishers continue to receive manuscripts in any word processing format and must convert the author source to their internal editorial format. Following editorial review and approval, the accepted manuscript often must be converted into the typographic composition format, copy edited for style, and sent to press. As producers of the SGML-based electronic version, we must convert whatever format we receive from the publisher or printer to SGML. When SGML is introduced "downstream" into the publication process, the conversion/translation costs can be high. A commitment to SGML throughout the entire publication process would result in better products, substantial savings and improved speed of publication.

On the print side, a publisher that can generate SGML finds few press systems able to receive SGML encoding. Typesetters and printers have major infrastructure investments in hardware and software designed to produce hardcopy. They also recognize that SGML is a serious threat to their businesses. Those printers using Postscript to drive their presses can produce PDF files for their publisher customers at little added cost. By providing an inexpensive and expeditious electronic product in Adobe Acrobat to their customers, printers have given publishers a compelling reason to delay more widespread adoption of SGML.

SGML on the Web

Despite a logical developmental path to include web browsing, DynaText remains a CD-ROM/LAN browser. EBT has elected to develop the SGML server side of the Internet publishing solution, avoiding the contentious "browser wars". EBT's SGML server, called DynaWeb ver. 3.0, accepts the SGML content prepared for CD-ROM/LAN delivery and serves it in as HTML acceptable to existing web browsers such as Netscape Navigator and Internet Explorer. The result is a quite respectable interim solution, even with the limited capabilities of existing web browsers. SoftQuad [10] has developed Panorama Pro ver. 1.5, an SGML browser that commonly functions as a helper application launched from within a web browser. The SoftQuad system requires all SGML source files (SGML content, DTD, stylesheets, entities, graphics, etc.) from each document to be sent before rendering online. Viewing, linking and searching is then limited to the one document being viewed. This system does not include a means of indexing, searching and viewing a collection of documents as a whole.

[Dynaweb ScreenShot]


Figure 4. An example of SGML-based IEEE journal delivered through an HTML internet browser (Netscape Navigator), via the DynaWeb server. This system provides structured searching, hyperlinks and other interactivity made possible by SGML encoding.

The World Wide Web has provided readers a huge sampling of different electronic publication schemes, including thousands of examples of Acrobat. Most readers browsing online, however, only encounter HTML, the "language" of the Web. Many people readily recognize a compelling, easy to program, and simple to use format in HTML, yet never realize that HTML is a simplized application of SGML. In his original design of the World Wide Web, CERN scientist Tim Berners-Lee saw the benefits of adopting an established, non-proprietary, platform-independent text encoding system, readily capable of extensive hyperlinking. HTML as a simple, stripped-down application of SGML, can deliver some document structure. Newer versions of HTML are now in the process of reconstituting the richness of it's SGML origins. A more logical approach to improving document delivery over the Web would be to stop improving HTML (to function within existing web browser capabilities) and instead advance web servers and browsers to full SGML capabilities.

Recent developments help illustrate why SGML will likely be the next logical step in the evolution of the World Wide Web. The ability to modify and customize DTDs to suit individual publishers' requirements has resulted in many types of valid SGML. Common Web browsers (Internet Explorer, Netscape Navigator, etc.) have been designed for one simplied SGML application, HTML, and were not created to handle the variety of DTDs. One possible solution is the proposed XML (eXtensible Markup Language) standard, which will enable publishers to quickly port their existing SGML content to the web, without "dummying" it down to the level of HTML. XML was developed by an SGML Editorial Review Board formed under the auspices of the World Wide Web Consortium (W3C) in 1996 and chaired by Jon Bosak of Sun Microsystems, with the very active participation of an SGML Working Group also organized by the W3C [11].

By smoothing the road to getting SGML online, access to contents is improved, but not awareness. The explosion of Web contents points to the most important reason to use a structured markup format. The process of online publishing provides access to materials, but without cross-collection searching, most users cannot locate what they need to find. While the popular Web searching tools (Yahoo, Alta Vista, Lycos, etc.) provide some means of narrowing a search, they offer rather poor accuracy compared to traditional library search methodology. Bruce Schatz, Principal Investigator of the Digital Library Initiative project [12] at the University of Illinois, and his colleagues have addressed the related problems of searching collections distributed across the network, and relating the jargon of specialized fields through semantic indexing. Toward this end, they have created a testbed of a major engineering digital library, based upon SGML documents [13]. Complex and structured searching aimed at accurate retrieval from large data repositories will require structured documents.

Summary

The lack of critical SGML software applications has enabled Adobe Acrobat to become one of the most common formats for electronic publication online. The low cost of PDF production, however, is increasingly become less important as publishers recognize the greater long-term benefits of SGML-centric production. More important than publisher recognition of SGML is reader (market) acceptance. Acrobat's origins as a page layout format place severe restrictions on its appearance onscreen, the quality of illustrations and the speed of transmission. Online readers have come to expect text wrapping to fit the available window, powerful searching, extensive hyperlinking and high quality graphics. Acrobat fails to deliver these critical features.

If the goal of electronic publishing is complex and interactive documents capable of multimedia, hypertext, hyperlinks, dynamic rendering, unlimited choice of fonts and characters, control over presentation online and print, advanced searching, repackaging/republishing potential, and a non-proprietary archive format, SGML provides a logical and very efficient solution. Adobe PDF/Acrobat advocates will argue that some of these features are possible in Acrobat, however, once one attempts to maximize the functionality of an Acrobat document with indexing, navigation and hyperlinks, the short-term cost advantage quickly disappears.

Simply put, SGML is the "acid-free paper" of the electronic world. Despite its clear benefits, the limited acceptance of SGML in electronic publications is no mystery. The high cost of post-compositional translation of text into SGML and the resistance of typesetters and printers to retool their considerable infrastructure are real disincentives to change. Adobe Acrobat has provided an alternative that is cheap and "good enough" for the moment, but the fundamental problems of being based on a page descriptive language prevent it from becoming a comprehensive long-term electronic publishing solution.

What I do find hard to accept are software developers who fear to challenge the de facto publishing standards being established by major corporations. This will change, and all it will take is one brave and innovative company with an SGML-based "killer app". I can describe several killer SGML applications in detail today; all I wonder is who will build them, and when...

References

[1] W.S. Lovegrove and D.F. Brailsford, "Document Analysis of PDF files: Methods, Results and Implications", Electronic Publishing, vol. 8(3), pp.1-14, 1995

[2] SGML Open is a non-profit, international consortium of suppliers whose products and services support the Standard Generalized Markup Language: http://www.sgmlopen.org/

[3] The NCSA/SoftQuad SGML on the Web Page is a joint production of NCSA, the National Center for Supercomputing Applications, and SoftQuad Inc. It is maintained by Lucy Ventresca and the staff of SoftQuad.: http://www.ncsa.uiuc.edu/SDG/Software/Mosaic/WebSGML.html

[4] The SGML Centre provides consultancy and advice on the application of the Standard Generalized Markup Language (SGML) and related standards/applications: http://www.u-net.com/~sgml/

[5] The Whirlwind Guide to SGML Tools and Vendors, maintained by Steve Pepper: http://www.falch.no/people/pepper/sgmltool/

[6] The SGML Web Page maintained by Robin Cover: http://www.sil.org/sgml/sgml.html

[7] Electronic Book Technologies (EBT), a division of INSO Corporation, is located at One Richmond Square, Providence, RI 02906. Tel: (401) 421-9550, Fax: (401) 421-9551, Email: info@ebt.com, Web: http://www.ebt.com

[8] SGML and the Semantic Representation of Mathematics Roy Pike, Clerk Maxwell Professor of Theoretical Physics, King's College, Strand, London, U.K. Stephen Buswell, Stephen Healey, Martin Pike, Stilo Technology Ltd., Empire House, Mount Stuart Square, Cardiff, UK. 11. April. 1996: http://mish161.cern.ch/sc4wg6/math/pike.htm

[9] D. E. Knuth, Computers & Typesetting:
--- Volume A, The TeXbook (Reading, Massachusetts: Addison-Wesley, 1984), x+483pp. ISBN 0-201-13447-0
--- Volume B, TeX: The Program (Reading, Massachusetts: Addison-Wesley, 1986), xviii+600pp. ISBN 0-201-13437-3
--- Volume C, The METAFONTbook (Reading, Massachusetts: Addison-Wesley, 1986), xvi+451pp. ISBN 0-201-13445-4
--- Volume D, METAFONT: The Program (Reading, Massachusetts: Addison-Wesley, 1986), xviii+566pp. ISBN 0-201-13438-1
--- Volume E, Computer Modern Typefaces (Reading, Massachusetts: Addison-Wesley, 1986), xvi+588pp. ISBN 0-201-13446-2

[10] SoftQuad is located at 20 Eglinton Ave. West, 12th Floor, P.O. Box 2025, Toronto, Ontario Canada M4R 1K8, Tel: (416) 544-9000, Fax: (416) 544-0300, Email: mail@softquad.com, Web: http://www.softquad.com

[11] Extensible Markup Language (XML): W3C Working Draft 14-Nov-96, http://www.textuality.com/sgml-erb/WD-xml.html

[12] The Digital Libraries Initiative (DLI) project at the University of Illinois at Urbana-Champaign is developing the information infrastructure to effectively search technical documents on the Internet. Their testbed digital library is based in Standard Generalized Markup Language (SGML) from engineering and science publishers: http://dli.grainger.uiuc.edu/

[13] B. Schatz, "Information Retrieval in Digital Libraries: Bringing Search to the Net," Science, vol. 275, pp.327-334, 1997.

 



Author:

Pete Goldie, Ph.D.
President
Lightbinders, Inc.
2325 Third St. - Suite 324
San Francisco, CA   94107
415-621-5746 voice
415-621-5898 fax
<pg@lbin.com>
http://www.lbin.com/

Bio:

Dr. Peter Goldie received his doctorate from the Sackler Institute of New York University, specializing in the biochemistry and immunology of human malaria. In 1988, Dr. Goldie founded Lightbinders, with the implict goal of developing electronic publication methods for scientific and technical titles. Lightbinders has produced dozens of academic CD-ROM and online titles, including Journal of Biological Chemistry, Protein Science, Optics Letters, Methods in Enzymology, and IEEE/Computer Society Magazines and Transactions. He is most proud of the Darwin Multimedia CD-ROM, 2nd Edition, which is the result of an ongoing international collaborative project now in its 8th year.