Zeta Components - high quality PHP components

Zeta Components Manual :: Docs For Class ezcDocumentPdfTokenizer

Document::ezcDocumentPdfTokenizer

Class ezcDocumentPdfTokenizer

Abstract base class for tokenizer implementations.

Tokenizers are used to split a series of words (sentences) into single words, which can be rendered split by spaces.

Source for this file: /Document/src/document/pdf/tokenizer.php

Version:   //autogen//

Descendants

Child Class Description
ezcDocumentPdfDefaultTokenizer Tokenizer implementation for common texts, using whitespaces as word seperators.
ezcDocumentPdfLiteralTokenizer Tokenizer implementation for literal blocks, preserving whitespaces.

Constants

FORCED = 2 Constant indicating a forced breaking point without rendering a space character.
SPACE = 0 Constant indicating a breaking point, including a rendered space.
WRAP = 1 Constant indicating a possible breaking point without rendering a space character.

Method Summary

public abstract array tokenize( $string )
Split string into words

Methods

tokenize

array tokenize( string $string )

Split string into words

This function takes a string and splits it into words. There are different mechanisms which indicate possible splitting points in the resulting word stream:

  • self:SPACE: The renderer might render a space
  • self:WRAP: The renderer might wrap the line at this position, but will not render spaces, might as well just be omitted.
A possible splitting of an english sentence might look like:

  1.   array(
  2.       'Hello',
  3.       self:SPACE,
  4.       'world!',
  5.   );

Non breaking spaces should not be splitted into multiple words, so there will be no break applied.

Parameters:
Name Type Description
$string string
Redefined in descendants as:
Method Description
ezcDocumentPdfDefaultTokenizer::tokenize() Split string into words 
ezcDocumentPdfLiteralTokenizer::tokenize() Split string into words. 
Documentation generated by phpDocumentor 1.4.3