Zeta Components Manual :: Docs For Class ezcDocumentPdfTokenizer
Document::ezcDocumentPdfTokenizer
Class ezcDocumentPdfTokenizer
Abstract base class for tokenizer implementations.
Tokenizers are used to split a series of words (sentences) into single words, which can be rendered split by spaces.
Source for this file: /Document/src/document/pdf/tokenizer.php
Version: | //autogen// |
Descendants
Child Class | Description |
---|---|
ezcDocumentPdfDefaultTokenizer | Tokenizer implementation for common texts, using whitespaces as word seperators. |
ezcDocumentPdfLiteralTokenizer | Tokenizer implementation for literal blocks, preserving whitespaces. |
Constants
FORCED
= 2
|
Constant indicating a forced breaking point without rendering a space character. |
SPACE
= 0
|
Constant indicating a breaking point, including a rendered space. |
WRAP
= 1
|
Constant indicating a possible breaking point without rendering a space character. |
Method Summary
public abstract array |
tokenize(
$string
)
Split string into words |
Methods
tokenize
array
tokenize(
string
$string
)
Split string into words
This function takes a string and splits it into words. There are different mechanisms which indicate possible splitting points in the resulting word stream:
- self:SPACE: The renderer might render a space
- self:WRAP: The renderer might wrap the line at this position, but will not render spaces, might as well just be omitted.
- array(
- 'Hello',
- 'world!',
- );
Non breaking spaces should not be splitted into multiple words, so there will be no break applied.
Parameters:
Name | Type | Description |
---|---|---|
$string |
string |
Redefined in descendants as:
Method | Description |
---|---|
ezcDocumentPdfDefaultTokenizer::tokenize() |
Split string into words |
ezcDocumentPdfLiteralTokenizer::tokenize() |
Split string into words. |
Documentation generated by phpDocumentor 1.4.3