Zeta Components - high quality PHP components

Zeta Components Manual :: Docs For Class ezcDocumentWikiTokenizer

Document::ezcDocumentWikiTokenizer

Class ezcDocumentWikiTokenizer

Tokenizer for wiki documents

The tokenizer used for all wiki documents should prepare a token array, which can be used by the wiki parser, without any wiki language specific handling in the parser itself required. For this the tokenizing is performed in two steps:

1) Extract tokens from text 2) Filter tokens

Token extraction ----------------

For the token extraction the reqular expressions in the $tokens property are used. The $tokens array has to be build like, and can be created in the constrctor:

array(
array(
'class' => Class name of token,
'match' => Regular expression to match,
),
...
)

The array is evaluated in the given order, until one of the regular expressions match. The regular expression should have at least one named match (?P<value> ... ), with the name "value", which will be assigned to the token, created form the given class name, as its content. The matched contents will be removed from the beginning of the string. Optionally a second named match, called "match", may be used inside the regular expression. If so, only the contents inside this match will be removed from the beginning of the string. This enables you to perform a trivial lookahead inside the tokenizer.

If no expression matches, an exception will be thrown.

Token filtering ---------------

After all tokens are extracted from the text, they may miss some values, which may be required by the parser, like the level of title tokens. Those should be extracted and assigned during the filtering stage. For this the filterTokens() method should be implemented, which may iterate over the token stream and assign the required values.

If the wiki markup language supports plugins you may also want to "parse" the plugin contents to extract type, parameters and its text here.

Source for this file: /Document/src/document/wiki/tokenizer.php

Version:

//autogen//

Descendants

Child Class	Description
ezcDocumentWikiConfluenceTokenizer	Tokenizer for Confluence wiki documents.
ezcDocumentWikiCreoleTokenizer	Tokenizer for Creole wiki documents.
ezcDocumentWikiDokuwikiTokenizer	Tokenizer for Dokuwiki wiki documents.

Member Variables

protected array


              $tokens
               = array()

List with tokens and a regular expression matching the given token.

The tokens are matched in the given order.

Method Summary

public abstract void	`__construct( )` Construct tokenizer
protected void	`convertTabs( $token )` Convert tabs to spaces
protected abstract array	`filterTokens( $tokens )` Filter tokens
public array	`tokenizeFile( $file )` Tokenize the given file
public array	`tokenizeString( $string )` Tokenize the given string

Methods

__construct

void __construct( )

Construct tokenizer

Create token array with regular repression matching the respective token.

Redefined in descendants as:

Method	Description
`ezcDocumentWikiConfluenceTokenizer::__construct()`	Construct tokenizer
`ezcDocumentWikiCreoleTokenizer::__construct()`	Construct tokenizer
`ezcDocumentWikiDokuwikiTokenizer::__construct()`	Construct tokenizer

convertTabs

void convertTabs( ezcDocumentWikiToken $token )

Convert tabs to spaces

Convert all tabs to spaces, as defined in: http://docutils.sourceforge.net/docs/ref/rst/restructuredtext.html#whitespace

Parameters:

Name	Type	Description
`$token`	ezcDocumentWikiToken

filterTokens

array filterTokens( $tokens )

Filter tokens

Method to filter tokens, after the input string ahs been tokenized. The filter should extract additional information from tokens, which are not generally available yet, like the depth of a title depending on the title markup.

Parameters:

Name	Type	Description
`$tokens`	array

Redefined in descendants as:

Method	Description
`ezcDocumentWikiConfluenceTokenizer::filterTokens()`	Filter tokens
`ezcDocumentWikiCreoleTokenizer::filterTokens()`	Filter tokens
`ezcDocumentWikiDokuwikiTokenizer::filterTokens()`	Filter tokens

tokenizeFile

array tokenizeFile( string $file )

Tokenize the given file

The method tries to tokenize the passed files and returns an array of ezcDocumentWikiToken struct on succes, or throws a ezcDocumentTokenizerException, if something could not be matched by any token.

Parameters:

Name	Type	Description
`$file`	string

tokenizeString

array tokenizeString( string $string )

Tokenize the given string

The method tries to tokenize the passed strings and returns an array of ezcDocumentWikiToken struct on succes, or throws a ezcDocumentTokenizerException, if something could not be matched by any token.

Parameters:

Name	Type	Description
`$string`	string

Documentation generated by phpDocumentor 1.4.3