Class ezcDocumentPdfTokenizer

Abstract base class for tokenizer implementations.

Tokenizers are used to split a series of words (sentences) into single words, which can be rendered split by spaces.

Child Class Description
ezcDocumentPdfDefaultTokenizer Tokenizer implementation for common texts, using whitespaces as word seperators.
ezcDocumentPdfLiteralTokenizer Tokenizer implementation for literal blocks, preserving whitespaces.


FORCED = 2 Constant indicating a forced breaking point without rendering a space character.
SPACE = 0 Constant indicating a breaking point, including a rendered space.
WRAP = 1 Constant indicating a possible breaking point without rendering a space character.

Method Summary

public abstract array tokenize( $string )
Split string into words



array tokenize( string $string )

Split string into words

This function takes a string and splits it into words. There are different mechanisms which indicate possible splitting points in the resulting word stream:

  • self:SPACE: The renderer might render a space
  • self:WRAP: The renderer might wrap the line at this position, but will not render spaces, might as well just be omitted.
A possible splitting of an english sentence might look like:

  1.   array(
  2.       'Hello',
  3.       self:SPACE,
  4.       'world!',
  5.   );

Non breaking spaces should not be splitted into multiple words, so there will be no break applied.

Name Type Description
$string string
Redefined in descendants as:
Method Description
ezcDocumentPdfDefaultTokenizer::tokenize() Split string into words 
ezcDocumentPdfLiteralTokenizer::tokenize() Split string into words. 
