HumHub Documentation (unofficial)

StopWords
in package
implements TokenFilterInterface

Token filter that removes stop words. These words must be provided as array (set), example: $stopwords = array('the' => 1, 'an' => '1');

We do recommend to provide all words in lowercase and concatenate this class after the lowercase filter.

Tags
category

Zend

subpackage

Analysis

Table of Contents

Interfaces

TokenFilterInterface
Token filter converts (normalizes) Token ore removes it from a token stream.

Properties

$_stopSet  : array<string|int, mixed>
Stop Words

Methods

__construct()  : mixed
Constructs new instance of this filter.
loadFromFile()  : void
Fills stopwords set from a text file. Each line contains one stopword, lines with '#' in the first column are ignored (as comments).
normalize()  : Token
Normalize Token or remove it (if null is returned)

Properties

$_stopSet

Stop Words

private array<string|int, mixed> $_stopSet

Methods

__construct()

Constructs new instance of this filter.

public __construct([array<string|int, mixed> $stopwords = array() ]) : mixed
Parameters
$stopwords : array<string|int, mixed> = array()

array (set) of words that will be filtered out

loadFromFile()

Fills stopwords set from a text file. Each line contains one stopword, lines with '#' in the first column are ignored (as comments).

public loadFromFile([string $filepath = null ]) : void

You can call this method one or more times. New stopwords are always added to current set.

Parameters
$filepath : string = null

full path for text file with stopwords

Tags
throws
InvalidArgumentException
throws
RuntimeException

normalize()

Normalize Token or remove it (if null is returned)

public normalize(Token $srcToken) : Token
Parameters
$srcToken : Token
Return values
Token

        
On this page

Search results