Adobe Plunges PDF into XML

During the XML 2006 Conference held in Boston this week, Adobe Systems Inc., has planned to reveal a revolutionary preliminary design of what may very well be the next version of its Portable Document Format, or PDF for short. This new version is said to be made up entirely of the Extensible Markup Language.

While Adobe’s current version of PDF allows the creator of a document to package an XML-encased transcript of the text with that original document, documents that are generated in this new document layout format, which is codenamed Mars, will be created entirely in XML, said Joel Geraci, Adobe’s PDF developer spokesperson.

The research lab of the company has recently released the software for public inspection. If the feedback proves to be helpful and the Adobe corporate powers decide on this new format, Mars could be the next version of the PDF, and it would be included in the company’s offerings as early as the subsequent version of Acrobat is released.

“PDF is over 15 years old. It predates XML. The technology it’s not at the same level compatibility as XML, where there are a lot of tools and knowledge about how to work with XML,” explained Phillip Levy, Adobe PDF and XML architect who aided in developing Mars. “So moving the PDF technology onto an XML base gives us a lot better integration with the rest of the world,” he said.

Much like the documents currently created in Microsoft Office’s new XML formats, documents rendered in Mars will be a zipped assembly of a group of single files. For instance, a plain-text file of Scalable Vector Graphics, or SVG, will be able to hold not only the document text but also explicit instructions as to rendering the specific look and style of the document. The zipped collection will also be comprised of any and all images included within that document as well.

Adobe’s use of SVG could symbolize a major step in advancement for the XML-based format, Levy said. Although it is still in the folds of being fully developed, this XML-based language defines how visual presentation elements are depicted, allowing for precise controls over where each specific element appears within the layout. Adobe will also generate its original XML-based extensions to cover those visual elements that standardized SVG tags are not equipped to handle, Levy said.

In the ways of performance, this new format should be similar to the current PDF, Levy said. Although XML encoding can be especially intricate and thorough, the zipped compression should keep the file size untroublesome. The processing power required for rendering XML should also be similar to current PDFs, Levy explained. Eventually, the new format will contain each of the advanced features, like content security, that current PDFs offers.

According to Levy, converting PDF to all-XML will allow businesses to better incorporate aspects such as PDF generation and information extraction from PDF documents into their workflow.

Gerace demonstrated this prospective simplicity with a well-known “Hello World” file. He exhibited a Mars file for a document that contained only one line of text, “Hello World.” Next, he opened up the SVG file of the document and copied the “Hello World” line, including its SVG encasements, to a new line below the original. While doing this, he altered the offset value tag in order to make it appear just below the original line. After saving the SVG file, Gerace reopened the document in a viewer to show the audience that the second line was in fact added.

Gerace explained that although most PDF SVG files will be too complex to change them individually, the demonstration exhibited how easily an XML-parsing application would be able to manipulate a PDF file.

Today, external actions on PDFs can be completed using Adobe PDF libraries, but developers often label these libraries as difficult to work with, Levy said. XML, on the other hand, should be simpler, because the composition is familiar and can be easily incorporated into programming languages such as Java, he said.