Utf8
extends AbstractCommon
in package
AbstractCommon implementation of the analyzerfunctionality.
Tags
Table of Contents
Properties
- $_encoding : string
- Input string encoding
- $_input : string
- Input string
- $_bytePosition : int
- Current binary position in an UTF-8 stream
- $_filters : array<string|int, mixed>
- The set of Token filters applied to the Token stream.
- $_position : int
- Current char position in an UTF-8 stream
Methods
- __construct() : mixed
- Object constructor
- addFilter() : void
- Add Token filter to the AnalyzerInterface
- nextToken() : Token|null
- Tokenization stream API Get next token Returns null at the end of stream
- normalize() : Token
- Apply filters to the token. Can return null when the token was removed.
- reset() : void
- Reset token stream
- setInput() : void
- Tokenization stream API Set input
- tokenize() : array<string|int, mixed>
- Tokenize text to a terms Returns array of \ZendSearch\Lucene\Analysis\Token objects
Properties
$_encoding
Input string encoding
protected
string
$_encoding
= ''
$_input
Input string
protected
string
$_input
= null
$_bytePosition
Current binary position in an UTF-8 stream
private
int
$_bytePosition
$_filters
The set of Token filters applied to the Token stream.
private
array<string|int, mixed>
$_filters
= array()
Array of \ZendSearch\Lucene\Analysis\TokenFilter\TokenFilterInterface objects.
$_position
Current char position in an UTF-8 stream
private
int
$_position
Methods
__construct()
Object constructor
public
__construct() : mixed
Tags
addFilter()
Add Token filter to the AnalyzerInterface
public
addFilter(TokenFilterInterface $filter) : void
Parameters
- $filter : TokenFilterInterface
nextToken()
Tokenization stream API Get next token Returns null at the end of stream
public
nextToken() : Token|null
Return values
Token|nullnormalize()
Apply filters to the token. Can return null when the token was removed.
public
normalize(Token $token) : Token
Parameters
- $token : Token
Return values
Tokenreset()
Reset token stream
public
reset() : void
setInput()
Tokenization stream API Set input
public
setInput(string $data[, mixed $encoding = '' ]) : void
Parameters
- $data : string
- $encoding : mixed = ''
tokenize()
Tokenize text to a terms Returns array of \ZendSearch\Lucene\Analysis\Token objects
public
tokenize(string $data[, mixed $encoding = '' ]) : array<string|int, mixed>
Tokens are returned in UTF-8 (internal Zend_Search_Lucene encoding)
Parameters
- $data : string
- $encoding : mixed = ''