StopWords
in package
implements
TokenFilterInterface
Token filter that removes stop words. These words must be provided as array (set), example: $stopwords = array('the' => 1, 'an' => '1');
We do recommend to provide all words in lowercase and concatenate this class after the lowercase filter.
Tags
Table of Contents
Interfaces
- TokenFilterInterface
- Token filter converts (normalizes) Token ore removes it from a token stream.
Properties
- $_stopSet : array<string|int, mixed>
- Stop Words
Methods
- __construct() : mixed
- Constructs new instance of this filter.
- loadFromFile() : void
- Fills stopwords set from a text file. Each line contains one stopword, lines with '#' in the first column are ignored (as comments).
- normalize() : Token
- Normalize Token or remove it (if null is returned)
Properties
$_stopSet
Stop Words
private
array<string|int, mixed>
$_stopSet
Methods
__construct()
Constructs new instance of this filter.
public
__construct([array<string|int, mixed> $stopwords = array() ]) : mixed
Parameters
- $stopwords : array<string|int, mixed> = array()
-
array (set) of words that will be filtered out
loadFromFile()
Fills stopwords set from a text file. Each line contains one stopword, lines with '#' in the first column are ignored (as comments).
public
loadFromFile([string $filepath = null ]) : void
You can call this method one or more times. New stopwords are always added to current set.
Parameters
- $filepath : string = null
-
full path for text file with stopwords
Tags
normalize()
Normalize Token or remove it (if null is returned)
public
normalize(Token $srcToken) : Token
Parameters
- $srcToken : Token