Zeta Components - high quality PHP components

eZ Components - Document

Introduction

The document component offers transformations between different semantic markup languages, like:

ReStructured text
XHTML
Docbook
eZ Publish XML markup
Wiki markup languages, like: Creole, Dokuwiki and Confluence
Open Document Text as used by OpenOffice.org and other office suites

Like shown in figure 1, each format supports conversions from and to docbook as a central intermediate format and may implement additional shortcuts for conversions from and to other formats. Not each format can express the same semantics, so there may be some information lost, which is documented in a dedicated document.

Conversion architecture in document component Figure 1: Conversion architecture in document component

There are central handler classes for each markup language, which follow a common conversion interface ezcDocument and all implement the methods getAsDocbook() and createFromDocbook().

Additionally the document component can render documents in the following output formats. Those formats cannot be read, but just generated:

Markup languages

The following markup languages are currently handled by the document component.

ReStructured text

RsStructured Text (RST) is a simple text based markup language, intended to be easy to read and write by humans. Examples can be found in the documentation of RST.

The transformation of a simple RST document to docbook can be done just like this:

<?php
require 'tutorial_autoload.php';
$document = new ezcDocumentRst();
$document->loadFile( '../tutorial.txt' );
$docbook = $document->getAsDocbook();
echo $docbook->save();
?>

In line 3 the document is actually loaded and parsed into an internal abstract syntax tree. In line 5 the internal structure is then transformed back to a docbook document. In the last line the resulting document is returned as a string, so that you can echo or store it.

Error handling

By default each parsing or compiling error will be transformed into an exception, so that you are noticed about those errors. The error reporting settings can be modified like for all other document handlers:

<?php
$document = new ezcDocumentRst();
$document->options->errorReporting = E_PARSE | E_ERROR | E_WARNING;
$document->loadFile( '../tutorial.txt' );
$docbook = $document->getAsDocbook();
echo $docbook->save();

Where the setting in line 3 causes, that only warnings, errors and fatal errors are transformed to exceptions now, while the notices are only collected, but ignored. This setting affects both, the parsing of the source document and the compiling into the destination language.

Directives

RST directives are elements in the RST documents with parameters, optional named options and optional content. The document component implements a well known subset of the directives implemented in the docutils RST parser. You may register custom directive handlers, or overwrite existing directive handlers using your own implementation. A directive in RST markup with parameters, options and content could look like:

My document
===========

The custom directive:

.. my_directive:: parameters
    :option: value

    Some indented text...

For such a directive you should register a handler on the RST document, like:

<?php
$document = new ezcDocumentRst();
$document->registerDirective( 'my_directive', 'myCustomDirective' );
$document->loadFile( $from );
$docbook = $document->getAsDocbook();
$xml = $docbook->save();

The class myCustomDirective must extend the class ezcDocumentRstDirective, and implement the method toDocbook(). For rendering you get access to the full AST, the contents of the current directive and the base path, where the document resist in the file system - which is necessary for accessing external files.

Directive example

A full example for a custom directive, where we want to embed real world addresses into our RST document and maintain the semantics in the resulting docbook, could look like:

Address example
===============

.. address:: John Doe
    :street: Some Lane 42

We would possibly add more information, like the ZIP code, city and state, but skip this to keep the code short. The implemented directive then would just need to take these information and transform it into valid docbook XML using the DOM extension.

<?php
class myAddressDirective extends ezcDocumentRstDirective
{
public function toDocbook( DOMDocument $document, DOMElement $root )
{
$address = $document->createElement( 'address' );
$root->appendChild( $address );
if ( !empty( $this->node->parameters ) )
{
$name = $document->createElement( 'personname', htmlspecialchars( $this->node->parameters ) );
$address->appendChild( $name );
}
if ( isset( $this->node->options['street'] ) )
{
$street = $document->createElement( 'street', htmlspecialchars( $this->node->options['street'] ) );
$address->appendChild( $street );
}
}
}
?>

The AST node, which should be rendered, is passed to the constructor of the custom directive visitor and available in the class property $node. The complete DOMDocument and the current DOMNode are passed to the method. In this case we just create a address node with the optional child nodes street and personname, depending on the existence of the respective values.

You can now render the RST document after you registered you custom directive handler as shown above:

<?php
require 'tutorial_autoload.php';
// Load custom directive
require '00_01_address_directive.php';
$document = new ezcDocumentRst();
$document->registerDirective( 'address', 'myAddressDirective' );
$document->loadString( <<<EORST
Address example
===============
.. address:: John Doe
:street: Some Lane 42
EORST
);
$docbook = $document->getAsDocbook();
echo $docbook->save();
?>

The output will then look like:

<?xml version="1.0"?>
<article xmlns="http://docbook.org/ns/docbook">
  <section id="address_example">
    <sectioninfo/>
    <title>Address example</title>
    <address>
      <personname> John Doe</personname>
      <street> Some Lane 42</street>
    </address>
  </section>
</article>

XHTML rendering

For RST a conversion shortcut has been implemented, so that you don't need to convert the RST to docbook and the docbook to XHTML. This saves conversion time and enables you to prevent from information loss during multiple conversions:

<?php
$document = new ezcDocumentRst();
$document->loadFile( $from );
$xhtml = $document->getAsXhtml();
$xml = $xhtml->save();

The default XHTML compiler generates complete XHTML documents, including header and meta-data in the header. If you want to in-line the result, you may specify another XHTML compiler, which just creates a XHTML block level element, which can be embedded in your source code:

<?php
$document = new ezcDocumentRst();
$document->options->xhtmlVisitor = 'ezcDocumentRstXhtmlBodyVisitor';
$document->loadFile( $from );
$xhtml = $document->getAsXhtml();
$xml = $xhtml->save();

You can of course also use the predefined and custom directives for XHTML rendering. The directives used during XHTML generation also need to implement the interface ezcDocumentRstXhtmlDirective.

Modification of XHTML rendering

You can modify the generated output of the XHTML visitor by creating a custom visitor for the RST AST. The easiest way probably is to extend from one of the existing XHTML visitors and reusing it. For example you may want to fill the type attribute in bullet lists, like known from HTML, which isn't valid XHTML, though:

class myDocumentRstXhtmlVisitor extends ezcDocumentRstXhtmlVisitor
{
    protected function visitBulletList( DOMNode $root, ezcDocumentRstNode $node )
    {
        $list = $this->document->createElement( 'ul' );
        $root->appendChild( $list );

        $listTypes = array(
            '*'            => 'circle',
            '+'            => 'disc',
            '-'            => 'square',
            "\xe2\x80\xa2" => 'disc',
            "\xe2\x80\xa3" => 'circle',
            "\xe2\x81\x83" => 'square',
        );
        // Not allowed in XHTML strict
        $list->setAttribute( 'type', $listTypes[$node->token->content] );

        // Decoratre blockquote contents
        foreach ( $node->nodes as $child )
        {
            $this->visitNode( $list, $child );
        }
    }
}

The structure, which is not enforced for visitors, but used in the docbook and XHTML visitors, is to call special methods for each node type in the AST to decorate the AST recursively. This method will be called for all bullet list nodes in the AST which contain the actual list items. As the first parameter the current position in the XHTML DOM tree is also provided to the method.

To create the XHTML we can now just create a new list node (<ul>) in the current DOMNode, set the new attribute, and recursively decorate all descendants using the general visitor dispatching method visitNode() for all children in the AST. For the AST children being also rendered as children in the XML tree, we pass the just created DOMNode (<ul>) as the new root node to the visitNode() method.

After defining such a class, you could use the custom visitor like shown above:

<?php
$document = new ezcDocumentRst();
$document->options->xhtmlVisitor = 'myDocumentRstXhtmlVisitor';
$document->loadFile( $from );
$xhtml = $document->getAsXhtml();
$xml = $xhtml->save();

Now the lists in the generated XHTML will also the type attribute set.

Writing RST

Writing a RST document from an existing docbook document, or a ezcDocumentDocbook object generated from some other source, is trivial:

<?php
require 'tutorial_autoload.php';
$docbook = new ezcDocumentDocbook();
$docbook->loadFile( 'docbook.xml' );
$rst = new ezcDocumentRst();
$rst->createFromDocbook( $docbook );
echo $rst->save();
?>

For the conversion internally the ezcDocumentDocbookToRstConverter class is used, which can also be called directly, like:

$converter = new ezcDocumentDocbookToRstConverter();
$rst = $converter->convert( $docbook );

Using this you can configure the converter to your wishes, or extend the convert to handle yet unhandled docbook elements. The converter is, as usaul configured using its option property, and the options are defined in the ezcDocumentDocbookToRstConverterOptions class. There you may configure the header underlines used, the bullet types or the line wrapping.

Extending RST writing

As said before, not all existing docbook elements might already be handled by the converter. But its handler based mechanism makes it easy to extend or overwrite existing behaviour.

Similar to the example above we can convert the <address> docbook element back to the address RST directive.

<?php
require 'tutorial_autoload.php';
$docbook = new ezcDocumentDocbook();
$docbook->loadFile( 'address.xml' );
class myAddressElementHandler extends ezcDocumentDocbookToRstBaseHandler
{
public function handle( ezcDocumentElementVisitorConverter $converter, DOMElement $node, $root )
{
$root .= $this->renderDirective( 'address', $node->textContent, array() );
return $root;
}
}
$converter = new ezcDocumentDocbookToRstConverter();
$converter->setElementHandler( 'docbook', 'address', new myAddressElementHandler() );
$rst = $converter->convert( $docbook );
echo $rst->save();
?>

The handler classes are assigned to XML elements in some namespace, "docbook" in this case. It is registered in line 18 for the element "address". The class itself has to extend from the ezcDocumentElementVisitorHandler class, which is in this case already extended by ezcDocumentDocbookToRstBaseHandler, which provides some convenience methods for RST creation, like renderDirective() used in this example.

The handler is called, whenever the element, it has been registered for, occurs in the docbook XML tree. In this case it has to append the generated RST part for this element to the RST document - and may call the general conversion handler again for its child elements. This example converts the above shown docbook XML back to:

.. _address_example:

===============
Address example
===============

.. address::
       John Doe
       Some Lane 42

Which ignores any special address sub elements for the simplicity of the example. For more examples on element handlers check the existing implementations.

XHTML

Converting XHTML or HTML to a document markup language is a non trivial task, because XHTML elements are often used for layout, ignoring the actual semantics of the element. Therefore the document component allows to stack a set of filters, which each performs a specific conversion task. The default filter stack may work fine, but you may want to also implement custom filters depending on the contents of the filtered website, or to cover additional sources of meta data information, like RDF, Microformats or similar.

The available filters are:

ezcDocumentXhtmlElementFilter
This filter just maintains the common semantics of XHTML elements by converting them to their docbook equivalents. It ignores common class names. This filter is the most basic and you probably want to always add this one to the filter stack.
ezcDocumentXhtmlXpathFilter
The XPath filter takes a XPath expression to locate the root of the document contents. It makes no sense to use this one together with the content locator filter. This is a more static, but also more precise way to tell the converter where to find the actual contents.
ezcDocumentXhtmlMetadataFilter
This filter extracts common meta data from the XHTML head, and converts it into docbook section info elements.
ezcDocumentXhtmlTablesFilter
HTML tables are especially often used for layout markup. This filter takes a threshold, and if the table text factor drops below this threshold the table is ignored. The same is true for stacked tables.
ezcDocumentXhtmlContentLocatorFilter
The content locator filter tries to find the actual article in the markup of a website, ignoring the surrounding layout markup. This seems to work well for example for common news sites.

By default just the element and meta data filters are used. So the conversion of a common website, like the introduction article from ezcomponents.org, results in a docbook document containing all lists for the navigation, etc..

<?php
require 'tutorial_autoload.php';
$xhtml = new ezcDocumentXhtml();
$xhtml->loadFile( 'ez_components_introduction.html' );
$docbook = $xhtml->getAsDocbook();
echo $docbook->save();
?>

So let's additionally use the XPath filter to pass the location of the actual content to the conversion:

<?php
require 'tutorial_autoload.php';
$xhtml = new ezcDocumentXhtml();
$xhtml->setFilters( array(
new ezcDocumentXhtmlElementFilter(),
new ezcDocumentXhtmlMetadataFilter(),
new ezcDocumentXhtmlXpathFilter( '//div[@class="document"]' ),
) );
$xhtml->loadFile( 'ez_components_introduction.html' );
$docbook = $xhtml->getAsDocbook();
echo $docbook->save();
?>

With this additional filter, the contents are correctly found and converted properly.

Writing XHTML

Writing XHTML from docbook is very similar to the approach used for writing RST: It the same handler based mechanism, so you may want to check that chapter to learn how to extend it for unhandled docbook elements.

<?php
require 'tutorial_autoload.php';
$docbook = new ezcDocumentDocbook();
$docbook->loadFile( 'docbook.xml' );
$html = new ezcDocumentXhtml();
$html->createFromDocbook( $docbook );
echo $html->save();
?>

As you can see, it happens the same way, as for other conversion from Docbook to any other format.

HTML styles

By default inline CSS is embedded in all generated HTML, to create a more appealing default experience. This may of course be deactivated and you may also reference custom style sheets to be included in the generated HTML.

<?php
require 'tutorial_autoload.php';
$docbook = new ezcDocumentDocbook();
$docbook->loadFile( 'docbook.xml' );
$converter = new ezcDocumentDocbookToHtmlConverter();
// Remove the inline CSS
$converter->options->styleSheet = null;
// Add custom CSS style sheets
$converter->options->styleSheets = array(
'/styles/screen.css',
);
$html = $converter->convert( $docbook );
echo $html->save();
?>

For this we again use the converted directly to be able to configure it as we like.

eZ Xml

eZ XML describes the markup format used internally by eZ Publish for storing markup in content objects. The format is roughly specified in the eZ Publish documentation.

Modules are often register custom elements, which are not specified anywhere, so there might be several elements not handled by default.

Reading eZ XML

Reading eZ XML is basically the same as for all other formats:

<?php
require 'tutorial_autoload.php';
$document = new ezcDocumentEzXml();
$document->loadString( '<?xml version="1.0"?>
<section xmlns="http://ez.no/namespaces/ezpublish3">
<header>Paragraph</header>
<paragraph>Some content...</paragraph>
</section>' );
$docbook = $document->getAsDocbook();
echo $docbook->save();
?>

As always the document object is either constructed from an input string or file. To convert into docbook you may just use the method getAsDocbook().

Link handling

Inside eZ XML documents link URIs are replaced with IDs, which reference the links inside the eZ Publish database, to ensure that a changed link is update globally. The replacing of such links is handled by a class extending from ezcDocumentEzXmlLinkProvider. By default dummy URLs are added to the documents.

URLs are either referenced directly by their ID, a node ID, or an object ID. Those parameters are passed to the link provide, which then should return an URL for that.

<?php
require 'tutorial_autoload.php';
class myLinkProvider extends ezcDocumentEzXmlLinkProvider
{
public function fetchUrlById( $id, $view, $show_path )
{
return 'http://host/path/' . $id;
}
public function fetchUrlByNodeId( $id, $view, $show_path ) {}
public function fetchUrlByObjectId( $id, $view, $show_path ) {}
}
$document = new ezcDocumentEzXml();
$document->loadString( '<?xml version="1.0"?>
<section xmlns="http://ez.no/namespaces/ezpublish3">
<header>Paragraph</header>
<paragraph>Some content, with a <link url_id="1">link</link>.</paragraph>
</section>' );
// Set link provider
$converter = new ezcDocumentEzXmlToDocbookConverter();
$converter->options->linkProvider = new myLinkProvider();
$docbook = $converter->convert( $document );
echo $docbook->save();
?>

The link provider is only implemented as a trivial stub, but you can establish a database connection there and actually fetch the required data. I this case the generated docbook document look like:

<?xml version="1.0"?>
<!DOCTYPE article PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN" "http://www.oasis-open.org/docbook/xml/4.5/docbookx.dtd">
<article xmlns="http://docbook.org/ns/docbook">
  <section>
    <title>Paragraph</title>
    <para>Some content, with a <ulink url="http://host/path/1">link</ulink>.</para>
  </section>
</article>

The link provider is set again as a option of the converter. Like shown for the docbook conversions of the other handlers, you can register element handlers for yet unhandled eZ XML elements on the converter, too.

Wrting eZ XML

Writing eZ XML works nearly the same as reading. It again uses a XML based element handled, like shown in the Docbook to RST conversion in more detail. For the link conversion an object extending from ezcDocumentEzXmlLinkConverter is used, which returns an array with the attributes of the link in the eZ XML document.

Wiki markup

Wiki markup has no central standard, but is used as a term to describe some common subset with lots of different extensions. Most wiki markup languages only support a quite trivial markup with severe limitations on the recursion of markup blocks. For example no markup really tables containing lists, or especially not tables containing other tables.

The document component implements a generic parser to support multiple wiki markup languages. For each different markup syntax a tokenizer has to be implemented, which converts the implemented markup into a unified token stream, which can then be handled by the generic parser.

The document component currently supports reading three wiki markup languages, but new ones are added easily by implementing another tokenizer. Supported are:

Creole, developed by a initiative with the intention to create a unified wiki markup standard. This is the default wiki language, and currently the only one which can be written.
Creole currently only supports a very limited set of markup, all further markup additions are still up to discussion.
Dokuwiki is a popular wiki system, for example used on wiki.php.net with a quite different syntax, and the most complete markup support, even including something like footnotes.
Confluence is a common Java based wiki with an entirely different and most uncommon syntax, which has mainly been implemented to prove the generic nature of the parser.

All markup languages are tested against all examples from the respective markup language documentation, there might still be cases where the parsers of the default implementation behaves slightly different from the implementation in the document component.

Reading wiki markup

Reading wiki texts basically works like for any other markup language:

<?php
require 'tutorial_autoload.php';
$document = new ezcDocumentWiki();
$document->loadString( '
= Example text =
Just some exaple paragraph with a heading, some **emphasis** markup and a
[[http://ezcomponents.org|link]].' );
$docbook = $document->getAsDocbook();
echo $docbook->save();
?>

As said, by default the Creoletokenizer is used. The same result can be produced with dokuwiki markup and switching the tokenizer:

<?php
require 'tutorial_autoload.php';
$document = new ezcDocumentWiki();
$document->options->tokenizer = new ezcDocumentWikiConfluenceTokenizer();
$document->loadString( '
h1. Example text
Just some exaple paragraph with a heading, some *emphasis* markup and a
[link|http://ezcomponents.org].' );
$docbook = $document->getAsDocbook();
echo $docbook->save();
?>

Writing wiki markup

Until now only writing of creole wiki markup is supported. Since creole does not support a lot of the markup available in docbook, not all documents might get converted properly. Because it does not even support explicit internal references, we cannot even simulate footnotes like in HTML.

If you want to add support for such conversions, it works exactly like the docbook RST conversion and can be extended the same way.

<?php
require 'tutorial_autoload.php';
$docbook = new ezcDocumentDocbook();
$docbook->loadFile( 'docbook.xml' );
$document = new ezcDocumentWiki();
$document->createFromDocbook( $docbook );
echo $document->save();
?>

PDF

PDF (Portable Document Format) has been developed to provide a document format, which can be presented software and system independent. Because of this it is often used as a pre-print document exchange format.

The document componen can generate PDF document from all other input formats and offers a language very similar to CSS to apply custom styling to the generated output. Additionally it supports adding custom parts, like footers and headers, to the PDF document.

Reading PDF

The document component for now does not support reading PDF documents.

Writing PDF

Writing PDF basically works like writing any other format supported by the document component, like the basic example shows:

<?php
require 'tutorial_autoload.php';
// Convert some input RSTfile to docbook
$document = new ezcDocumentRst();
$document->loadFile( './article/introduction.txt' );
$pdf = new ezcDocumentPdf();
$pdf->options->errorReporting = E_PARSE | E_ERROR | E_WARNING;
$pdf->createFromDocbook( $document->getAsDocbook() );
file_put_contents( __FILE__ . '.pdf', $pdf );
?>

First we include some RST file to create a Docbook file from it, because, like described before, Docbook is the central conversion format.

Afterwards the Docbook document is loaded by the PDF class and saved. When converting the document to a string the PDF is renderer using the default options and the default driver. The result of this rendering call can be watched here: 04_01_create_pdf.pdf.

Output writers

Since there are numerous different PDF renderers in the PHP world and the available ones might depend on the current environment, the document component supports different PDF driver, as wrapper around different existent libraries.

For now two implementation exist for pecl/haru and TCPDF, but it is fairly easy to write another one, for another PDF class.

Haru

libharu is a open source PDF generation library, written in C, and wrapped by the haru PHP extension, available from PECL. If PEAR is correctly setup on your machine it should install as easy as:

pear install pecl/haru

The Haru driver is pretty fast, but currently has issues with some special characters. It is the default driver, but can be explicitly used by setting the driver option on the PDF class, like:

$pdf = new ezcDocumentPdf();
$pdf->options->driver = new ezcDocumentPdfHaruDriver();

TCPDF

TCPDF is a pure PHP based PDF generation library, available from tcpdf.org. To use the TCPDF driver you need to download and include its main class before rendering the PDF. It supports all aspects of PDF rendering required by the document component, but has some bad coding practices, like:

Throws lots of warnings and notices, which you might want to silence by temporarily changing the error reporting level
Reads and writes several global variables, which might or might not interfere with your application code
Uses eval() in several places, which results in non-cacheable OP-Codes.

The TCPDF driver can be used after including the TCPDF source code, using:

$pdf = new ezcDocumentPdf();
$pdf->options->driver = new ezcDocumentPdfTcpdfDriver();

Styling the PDF

The PDF output can be styled using a CSS like language, which assigns styles based on the Docbook XML structure. The default styling rules are defined in the default.css.

The first most relevant part are the general layout options, which can be defined for the common article root node in the Docbook XML file. You can set global font options there, like:

article {
    // Basic font style definitions
    font-size:   "12pt";
    font-family: "serif";
    font-weight: "normal";
    font-style:  "normal";
    line-height: "1.4";
    text-align:  "left";

    // Basic page layout definitions
    text-columns: "1";
    text-column-spacing: "10mm";

    // General text layout options
    orphans: "3";
    widows:  "3";
}

The meaning of the first set of options should be obvious from CSS. We require each value to be wrapped by quotes for easier parsing, though.

The second set of options defines options for multi-column layouts, which are not available in the web, but quite common in generated PDF documents. You can specify the number of text columns, as well as the distance between the text columns here.

The third set in this example defines lesser known text layout options like the handling of orphans and widows, which specify the handling of overlapping parts of paragraphs on page wrapping.

You can, of course, apply those styles to any elements in your document, using the common CSS addressing rules, like:

// Emphasis node anywhere in the document
emphasis { ... }

// Title element directly below a section element
section > title { ... }

// Title element anywhere below a section element
section title { ... }

// Title element with the ID "first_title"
title#first_title { ... }

// Title element with the class "foo"
title.foo { ... }

// emphasis node directly below a title with class "foo", anywhere in a
// section with the ID "first"
section#first title.foo > emphasis { ... }

The values and measures for the properties are very similar to the properties in CSS. For example the margin and padding properties accept one- to four-tuples of values, with the same respective meaning like in CSS.

Another central formatting element, which is special to the PDF generation, is the virtual element "page":

page {
    page-size: "A4";
    page-orientation: "portrait";
    padding: "22mm 16mm";
}

The page-size property accepts several known page size identifiers and the page-orientation defines the orientation of a page. You can also address any page directly by its ID, which will be 'page_1' for the first page, or its class, which will be "right", or "left", depending on the current page number.

A detailed description of all available PDF style options is available here.

Measures

The properties in the PDF component accept different measures, which are:

"mm", Millimeters, the default measure, if none is specified
"pt", Points, 72 points per inch
"px", Pixel, depends on the set resolution, by default also 72 points per inch
"in", Inch

The unit "Points" is most common for font sizes, while millimeters or inches will probably more useful for page paddings. You are free to choose any of them and can even combine different units in one tuple, like:

para {
    // Top margin: 12 mm; Right margin: .1 inch; Bottom margin: 10 points,
    // Left margin: 1 pixel
    margin: "12 .1in 10pt 1px";
}

PDF parts

PDF parts are additional parts in a rendered document, like headers and footers. You can implement and register them yourself, and they are activated by different triggers, like:

on document creation
on page creation
when a document has been finished

The default implementation for headers and footers is triggered on page creation and renders the title of the document, its author and a page number in the header or the footer. To develop a custom PDF part you should extend from the ezcDocumentPdfPart class.

For the following document we are using a set of custom styles, as well as a header and a footer to customize the rendered PDF document. The additional custom CSS changes the default font and the page border:

article {
    font-family: "sans-serif";
    font-size: "10pt";
}

page {
    padding: "15mm 30mm";
}

The code using the custom CSS and headers and footers then looks like:

<?php
require 'tutorial_autoload.php';
// Convert some input RSTfile to docbook
$document = new ezcDocumentRst();
$document->loadFile( './article/introduction.txt' );
// Load the docbook document and create a PDF from it
$pdf = new ezcDocumentPdf();
$pdf->options->errorReporting = E_PARSE | E_ERROR | E_WARNING;
// Load a custom style sheet
$pdf->loadStyles( 'custom.css' );
// Add a customized footer
$pdf->registerPdfPart( new ezcDocumentPdfFooterPdfPart(
new ezcDocumentPdfFooterOptions( array(
'showDocumentTitle' => false,
'showDocumentAuthor' => false,
'height' => '10mm',
) )
) );
// Add a customized header
$pdf->registerPdfPart( new ezcDocumentPdfHeaderPdfPart(
new ezcDocumentPdfFooterOptions( array(
'showPageNumber' => false,
'height' => '10mm',
) )
) );
$pdf->createFromDocbook( $document->getAsDocbook() );
file_put_contents( __FILE__ . '.pdf', $pdf );
?>

The first part, the creation of a Docbook document from a RST document is just the same like in the first example.

Afterwards we load the above mentioned custom.css as an additional style. You can load as many styles as you want. If multiple styles are loaded, the latter ones always (partly) redefine the first styles.

After that two custom PDF parts are registered using their respective option class to configure their skin. The footer should only show the page number, while the header should display all parts (title and author), but the page number.

At the end of the example the document is created as usual, and looks like this: 04_02_create_pdf_styled.pdf Since the source document does not include any author information, this information is also not rendered in the header.

Hyphenating

Proper hyphenation is crucial for nice text rendering especially for justified paragraph formatting. Since hyphenation is highly language dependent you can create and use your own custom hyphenator - the default one doesn't do any hyphenation by default, but just keeps every word as it is.

Custom hyphenators can be implemented by extending from the abstract class ezcDocumentPdfHyphenator. The only need to implement one Method, `splitWord()`, which should return possible splitting points of the given word, as documented in the ezcDocumentPdfHyphenator class.

The custom hyphenator can be configured in the ezcDocumentPdfOptions class, like this:

$pdf = new ezcDocumentPdf();
$pdf->options->hyphenator = new myHyphenator();

The hyphenator will then be used by all text renderers during the rendering process.

Open Document Text

The Open Document Text (ODT) format is natively provided by the OpenOffice.org office application suite and supported by other common word processing tools. The Document component supports importing, exporting and styling of ODT files.

By now only im- and export of flat ODT (.fodt) files is possible. These can be processed by OpenOffice.org natively. To store FODT, simply choose the file type from the save dialog.

Reading ODT

The ODT document class reads FODT files and converts them into the internal Docbook representation of the Document component:

<?php
require 'tutorial_autoload.php';
$odt = new ezcDocumentOdt();
$odt->loadFile( 'simple.fodt' );
$docbook = $odt->getAsDocbook();
echo $docbook->save();
?>

You can generate any of the supported document formats from the Docbook representation.

FODT files may contain embedded media files, i.e. usually images, which will be extracted during the import process. You can specify the directory where these images will be stored through the `imageDir` option:

<?php
$odt->options->imageDir = '/path/to/your/images';

The default is your systems temporary directory.

Since Open Document only contains few semantical information compared to Docbook, the import mechanism performs heuristic detection of information like emphasized text. This mechanism is quite rudimentary by now and will be made available as a public API as it matured.

Writing ODT

FODT files can be written similar to any of the other formats supported by the Document component:

<?php
require 'tutorial_autoload.php';
$docbook = new ezcDocumentDocbook();
$docbook->loadFile( 'docbook.xml' );
$odt = new ezcDocumentOdt();
$odt->createFromDocbook( $docbook );
echo $odt->save();
?>

Styling ODT

FODT output can be styled using a CSS like language similar to Styling the PDF. Using simplified CSS you assign style rules to Docbook XML elements, which are generated into automatic styles in the resulting Open Document. The default styling rules (default.css) are the same as for PDF.

Applying custom styles can be done as follows:

<?php
require 'tutorial_autoload.php';
$docbook = new ezcDocumentDocbook();
$docbook->loadFile( 'docbook.xml' );
$converter = new ezcDocumentDocbookToOdtConverter();
$converter->options->styler->addStylesheetFile( 'custom.css' );
$odt = $converter->convert( $docbook );
echo $odt->save();
?>

A detailed description of the available style options is available here.