Zeta Components - high quality PHP components

Zeta Components Manual :: Docs For Class ezcDocumentPdfDefaultTokenizer

Document::ezcDocumentPdfDefaultTokenizer

Class ezcDocumentPdfDefaultTokenizer

Tokenizer implementation for common texts, using whitespaces as word seperators.

Source for this file: /Document/src/document/pdf/tokenizer/default.php

ezcDocumentPdfTokenizer
   |
   --ezcDocumentPdfDefaultTokenizer

Version:

//autogen//

Inherited Constants

From ezcDocumentPdfTokenizer:
`ezcDocumentPdfTokenizer::FORCED`	Constant indicating a forced breaking point without rendering a space character.
`ezcDocumentPdfTokenizer::SPACE`	Constant indicating a breaking point, including a rendered space.
`ezcDocumentPdfTokenizer::WRAP`	Constant indicating a possible breaking point without rendering a space character.

Method Summary

public array


              tokenize(
                                                                                    $string
                                                 )

Split string into words

Inherited Methods

From ezcDocumentPdfTokenizer
public abstract array	`ezcDocumentPdfTokenizer::tokenize()` Split string into words

Methods

tokenize

array tokenize( string $string )

Split string into words

This function takes a string and splits it into words. There are different mechanisms which indicate possible splitting points in the resulting word stream:

self:SPACE: The renderer might render a space
self:WRAP: The renderer might wrap the line at this position, but will not render spaces.

A possible splitting of an english sentence might look like:

array(
'Hello',
self:SPACE,
'world!',
);

Non breaking spaces should not be splitted into multiple words, so there will be no break applied.

Parameters:

Name	Type	Description
`$string`	string

Redefinition of:

Method	Description
`ezcDocumentPdfTokenizer::tokenize()`	Split string into words

Documentation generated by phpDocumentor 1.4.3