Archived at Pineapplesoft
 ananas.org 
  The Pineapplesoft Link newsletter covered a wide range of technical topics, see the archived issues.
The newsletter was first emailed in 1998. In 2001 Benoît discontinued it in favour of professional writing for magazines.
The “February 1998” page was archived in 2003 to preserve the original content of February 1998.
 
  | Home | Contact | Site map | Writings | Open source software |  


 

This article was awarded a Cool Resource by developer.com.

What you need to know about XML (February 1998)

Welcome to the second issue of Pineapplesoft Link. This month is a special issue because, in addition to the featured article on XML, I've included a short analysis of Netscape's announcement to support the GNU Public License. In the pipeline for the coming months, there is an article on rapid application development (RAD) with Java and a code-free introduction to CORBA.
I'd like to hear from you. Your opinions will help me improve the newsletter so please send your comments or suggestions to [address removed, the newsletter is no longer published thank you for your support]

A digression on Netscape

I have no intentions of competing with news services. The ZDNet, c|net, developer.com and ABCNews of this world are doing a great job. Still, Netscape's announcement that it would distribute Communicator 5 under the GNU Public License is a major announcement and I felt this newsletter would have been incomplete without at least minimal coverage. You can read the press release at http://www.netscape.com/newsref/pr/newsrelease558.html.

Essentially Netscape is making the source code of Communicator 5 freely available for anybody to modify it. In practice, Netscape is turning to the community who builds Linux, arguably one of the best Unix and Apache, the most popular Web server on the Internet, for the development of Communicator. It does not prevent Netscape from selling a professional version of Communicator. Indeed, in the Linux world, several commercial companies make a living by selling services (paid for support, packaged installation, etc.) for the freely available Linux.

Netscape could win on three main fronts. Firstly they can save on the development of their most visible, but least profitable, product. Secondly this move will no doubt make them even more popular with programmers and engineers. Communicator is already the favorite browser in that market. Lastly, Netscape hopes to gain share in the application market. Many successful applications like Intuit Quicken, AOL and Homesite have integrated Internet Explorer. None is using Communicator because it currently lacks the hooks which makes this integration possible. By offering source code, Netscape could threaten Microsoft domination of that market. If their strategy succeeds, expect a flood of Communicator-based applications and customized versions of Communicator.

What you need to know about XML

Let's turn to this month featured article on XML.

XML is the new companion to HTML. I believe that it has the potential to radically improve many Internet, Intranet and Extranet applications. This article will answer the two most common questions about XML: what is it and what is an XML parser?
For your convenience, there's a glossary at the end of the article.

What is XML?

XML is a new Web standard published by the W3C, the organization in charge of developing and promoting the Web. Amongst other things, the W3C also publishes the HTML standard.

XML is a standard to create electronic documents on the Internet. The first application of XML is to create Web pages, similar to existing ones but smarter. XML is not limited to Web pages though; potential documents include forms, EDI messages, channel definition (for push technology), application descriptions, etc. As we will see, XML is so flexible that the list of application is bounded only by imagination.

To understand XML, it's helpful to study HTML first. HTML is the language of Web pages. It is spoken by Web servers and understood by Web browsers.

HTML defines all the tags, which is HTML jargon for document elements, that compose Web pages. These include titles, paragraphs, hyperlinks, bullet lists, Java applets, tables, images, etc. Indeed HTML has defined a very large list of tags but, let me stress this, the list is finite and completely specified in the standard.

However, in practice, HTML never defines enough tags. Even though the competition between Netscape and Microsoft has resulted in many new tags, Web designers are always looking for more.

Let's take a directory of employees as an example. The directory lists employees with their name, extension number and email address. Unfortunately HTML has no tag for names, extension number or email address; therefore the directory designer has to fit his data in the limited HTML format. In this particular case, he could use a bullet list or a table.

Unlike HTML, XML does not define standard tags. It's the designer who declares the tags he needs in his document. For the directory, he would declare a name tag, an extension tag and an email tag. In other words, instead of struggling against the limitations of HTML, the Web designer tailors XML to his needs.

Why would one want to do that? What does one gain by declaring ones own tags? Obviously since XML tags are crafted specifically for the application at hand, they are more expressive than HTML tags. This results in smarter documents, documents that can be browsed in more efficient ways. For example, search engines can be more efficient with the XML directory than with the HTML one.

Don't expect XML to replace HTML quickly. Declaring tags is no easy task and is only justified for those cases where one wants to use the advanced features of smart documents. Still there are several applications that could benefit from XML. Besides directories, Web catalogs and document repositories are good examples. With specific tags for pricing and description, software agents could search Web catalogs for best-buy. Likewise specific tags for authors, keywords and title would enable smarter browsing of large repositories.

What is an XML parser?

The good thing about XML being a standard is that it is supported by tools already on the market. Microsoft, for example, has a line of XML tools available at http://www.microsoft.com/xml/. Also Internet Explorer 4 ships with an XML parser. Likewise Netscape will include an XML parser in future versions of Communicator.

The parser is omnipresent in XML. Still, in my experience, it is one of the least understood tools. Unless you are familiar with SGML (HTML and XML are based on SGML), you may have never heard of a parser.

A parser is a tool for programmers or Web designers only. It's not intended for end-users. Consequently you may browse XML documents and never see a parser even though is working in the background.

A parser is a library (or a component in object-oriented terminology) that reads and interprets XML documents for applications. Reading XML documents is no easy task. For one thing, the designer defines his own tags and the application must interpret these definitions correctly. Also the application must deal with errors. The parser takes care of the complexity, effectively shielding applications from the idiosyncrasies of XML.

Thanks to the parser, XML applications are easier and faster to develop. It takes several weeks for an experienced programmer to develop an XML parser but once it's done other programmers and Web designers can reuse it effortlessly. For example, the XML parser that ships with Internet Explorer 4 means faster development of XML applications.

Parsers really shine in combination with scripting languages like JavaScript (also known as ECMAScript). With scripting languages it's easy to write sophisticated XML applications fast. In the directory example, the designer can write a search function in a hundred lines of JavaScript or less. That will take him less than a day, versus several weeks if he couldn't benefit from the parser.

Conclusion

XML is an important addition to the Web. It enables smarter documents and a new generation of Web applications.

Glossary

EDI: Electronic Data Interchange
GPL: GNU Public License
HTML: HyperText Markup Language
SGML: Standard Generalized Markup Language
W3C: World Wide Web Consortium
XML: Extensible Markup Language

Self-promotion department

The subscription to the newsletter has grown by more than 300% in just one month!

While we are on the subject of XML, don't miss the Documentation 98 conference in March. I am a speacker there and I will present XML/EDI, one of those exciting XML applications for Extranets. You can find more information on XML/EDI at http://www.xmledi.net. For more information on Documentation 98, contact info@technoforum.fr. Please note that the conference is in French.

Please help us spread the word on Pineapplesoft Link, for example you can vote for us as one of the best 500 Belgian Web site. Visit http://www.best.be/top500/bestsites.cfm and vote for http://www.pineapplesoft.com in the business category. You don't have to live in Belgium to vote for us.

About Pineapplesoft Link

Pineapplesoft Link is a free email magazine. Each month, it discusses technologies, trends and facts of interest to web developers.

The information and design of this issue of Pineapplesoft Link are owned by Benoit Marchal and Pineapplesoft. Permission to copy or forward it is hereby granted provided it is prefaced with the words: "As appeared in Pineapplesoft Link - http://www.pineapplesoft.com."

Editor: Benoit Marchal
Publisher: Pineapplesoft www.psol.be

Acknowledgments: thanks to Sean McLoughlin MBA for helping me with this issue.

Back issues are available at http://www.psol.be/old/1/newsletter/.

Although the editor and the publisher have used reasonable endeavors to ensure accuracy of the contents, they assume no responsibility for any error or omission that may appear in the document.

Pineapplesoft is a registered trademark of Pineapplesoft sprl in the Benelux.

Last update: February 1998.
© 1998, Benoît Marchal. All rights reserved.
Design, XSL coding & photo: PineappleSoft OnLine.