HumHub Documentation (unofficial)

Index
in package
implements SearchIndexInterface

Tags
category

Zend

Table of Contents

Interfaces

SearchIndexInterface

Constants

FORMAT_2_1  = 1
FORMAT_2_3  = 2
FORMAT_PRE_2_1  = 0
GENERATION_RETRIEVE_COUNT  = 10
Generation retrieving counter
GENERATION_RETRIEVE_PAUSE  = 50
Pause between generation retrieving attempts in milliseconds

Properties

$_closeDirOnExit  : bool
File system adapter closing option
$_directory  : DirectoryInterface
File system adapter.
$_docCount  : int
Number of documents in this index.
$_formatVersion  : int
Index format version
$_generation  : int
Current segment generation
$_hasChanges  : bool
Flag for index changes
$_segmentInfos  : array<string|int, mixed>|SegmentInfo
Array of Zend_Search_Lucene_Index_SegmentInfo objects for current version of index.
$_termsStream  : TermStreamsPriorityQueue
Terms stream priority queue object
$_writer  : Writer
Writer for this index, not instantiated unless required.

Methods

__construct()  : mixed
Opens the index.
__destruct()  : mixed
Object destructor
addDocument()  : void
Adds a document to this index.
closeTermsStream()  : void
Close terms stream
commit()  : void
Commit changes resulting from delete() or undeleteAll() operations.
count()  : int
Returns the total number of documents in this index (including deleted documents).
currentTerm()  : Term|null
Returns term in current position
delete()  : void
Deletes a document from the index.
docFreq()  : int
Returns the number of documents in this index containing the $term.
find()  : array<string|int, mixed>|QueryHit
Performs a query against the index and returns an array of Zend_Search_Lucene_Search_QueryHit objects.
getActualGeneration()  : int
Get current generation number
getDirectory()  : DirectoryInterface
Returns the Zend_Search_Lucene_Storage_Directory instance for this index.
getDocument()  : Document
Returns a Zend_Search_Lucene_Document object for the document number $id in this index.
getFieldNames()  : array<string|int, mixed>
Returns a list of all unique field names that exist in this index.
getFormatVersion()  : int
Get index format version
getGeneration()  : int
Get generation number associated with this index instance
getMaxBufferedDocs()  : int
Retrieve index maxBufferedDocs option
getMaxMergeDocs()  : int
Retrieve index maxMergeDocs option
getMergeFactor()  : int
Retrieve index mergeFactor option
getSegmentFileName()  : string
Get segments file name
getSimilarity()  : AbstractSimilarity
Retrive similarity used by index reader
hasDeletions()  : bool
Returns true if any documents have been deleted from this index.
hasTerm()  : bool
Returns true if index contain documents with specified term.
isDeleted()  : bool
Checks, that document is deleted
maxDoc()  : int
Returns one greater than the largest possible document number.
nextTerm()  : Term|null
Scans terms dictionary and returns next term
norm()  : float
Returns a normalization factor for "field, document" pair.
numDocs()  : int
Returns the total number of non-deleted documents in this index.
optimize()  : void
Optimize index.
resetTermsStream()  : void
Reset terms stream.
setFormatVersion()  : void
Set index format version.
setMaxBufferedDocs()  : void
Set index maxBufferedDocs option
setMaxMergeDocs()  : void
Set index maxMergeDocs option
setMergeFactor()  : void
Set index mergeFactor option
skipTo()  : void
Skip terms stream up to specified term preffix.
termDocs()  : array<string|int, mixed>
Returns IDs of all documents containing term.
termDocsFilter()  : DocsFilter
Returns documents filter for all documents containing term.
termFreqs()  : int
Returns an array of all term freqs.
termPositions()  : array<string|int, mixed>
Returns an array of all term positions in the documents.
terms()  : array<string|int, mixed>
Returns an array of all terms in this index.
undeleteAll()  : void
Undeletes all documents currently marked as deleted in this index.
_getIndexWriter()  : Writer
Returns an instance of Zend_Search_Lucene_Index_Writer for the index
_readPre21SegmentsFile()  : void
Read segments file for pre-2.1 Lucene index format
_readSegmentsFile()  : void
Read segments file
_updateDocCount()  : void
Update document counter

Constants

FORMAT_2_1

public mixed FORMAT_2_1 = 1

FORMAT_2_3

public mixed FORMAT_2_3 = 2

FORMAT_PRE_2_1

public mixed FORMAT_PRE_2_1 = 0

GENERATION_RETRIEVE_COUNT

Generation retrieving counter

public mixed GENERATION_RETRIEVE_COUNT = 10

GENERATION_RETRIEVE_PAUSE

Pause between generation retrieving attempts in milliseconds

public mixed GENERATION_RETRIEVE_PAUSE = 50

Properties

$_closeDirOnExit

File system adapter closing option

private bool $_closeDirOnExit = true

$_docCount

Number of documents in this index.

private int $_docCount = 0

$_formatVersion

Index format version

private int $_formatVersion

$_generation

Current segment generation

private int $_generation

$_hasChanges

Flag for index changes

private bool $_hasChanges = false

$_segmentInfos

Array of Zend_Search_Lucene_Index_SegmentInfo objects for current version of index.

private array<string|int, mixed>|SegmentInfo $_segmentInfos = array()

$_writer

Writer for this index, not instantiated unless required.

private Writer $_writer = null

Methods

__construct()

Opens the index.

public __construct([Filesystem|string $directory = null ][, mixed $create = false ]) : mixed

IndexReader constructor needs Directory as a parameter. It should be a string with a path to the index folder or a Directory object.

Parameters
$directory : Filesystem|string = null
$create : mixed = false
Tags
throws
InvalidArgumentException
throws
RuntimeException

__destruct()

Object destructor

public __destruct() : mixed

addDocument()

Adds a document to this index.

public addDocument(Document $document) : void
Parameters
$document : Document

closeTermsStream()

Close terms stream

public closeTermsStream() : void

Should be used for resources clean up if stream is not read up to the end

commit()

Commit changes resulting from delete() or undeleteAll() operations.

public commit() : void
Tags
todo

undeleteAll processing.

count()

Returns the total number of documents in this index (including deleted documents).

public count() : int
Return values
int

currentTerm()

Returns term in current position

public currentTerm() : Term|null
Return values
Term|null

docFreq()

Returns the number of documents in this index containing the $term.

public docFreq(Term $term) : int
Parameters
$term : Term
Return values
int

getActualGeneration()

Get current generation number

public static getActualGeneration(DirectoryInterface $directory) : int

Returns generation number 0 means pre-2.1 index format -1 means there are no segments files.

Parameters
$directory : DirectoryInterface
Tags
throws
RuntimeException
Return values
int

getDocument()

Returns a Zend_Search_Lucene_Document object for the document number $id in this index.

public getDocument(int|QueryHit $id) : Document
Parameters
$id : int|QueryHit
Tags
throws
OutOfRangeException

is thrown if $id is out of the range

Return values
Document

getFieldNames()

Returns a list of all unique field names that exist in this index.

public getFieldNames([bool $indexed = false ]) : array<string|int, mixed>
Parameters
$indexed : bool = false
Return values
array<string|int, mixed>

getFormatVersion()

Get index format version

public getFormatVersion() : int
Return values
int

getGeneration()

Get generation number associated with this index instance

public getGeneration() : int

The same generation number in pair with document number or query string guarantees to give the same result while index retrieving. So it may be used for search result caching.

Return values
int

getMaxBufferedDocs()

Retrieve index maxBufferedDocs option

public getMaxBufferedDocs() : int

maxBufferedDocs is a minimal number of documents required before the buffered in-memory documents are written into a new Segment

Default value is 10

Return values
int

getMaxMergeDocs()

Retrieve index maxMergeDocs option

public getMaxMergeDocs() : int

maxMergeDocs is a largest number of documents ever merged by addDocument(). Small values (e.g., less than 10,000) are best for interactive indexing, as this limits the length of pauses while indexing to a few seconds. Larger values are best for batched indexing and speedier searches.

Default value is PHP_INT_MAX

Return values
int

getMergeFactor()

Retrieve index mergeFactor option

public getMergeFactor() : int

mergeFactor determines how often segment indices are merged by addDocument(). With smaller values, less RAM is used while indexing, and searches on unoptimized indices are faster, but indexing speed is slower. With larger values, more RAM is used during indexing, and while searches on unoptimized indices are slower, indexing is faster. Thus larger values (> 10) are best for batch index creation, and smaller values (< 10) for indices that are interactively maintained.

Default value is 10

Return values
int

getSegmentFileName()

Get segments file name

public static getSegmentFileName(int $generation) : string
Parameters
$generation : int
Return values
string

hasDeletions()

Returns true if any documents have been deleted from this index.

public hasDeletions() : bool
Return values
bool

hasTerm()

Returns true if index contain documents with specified term.

public hasTerm(Term $term) : bool

Is used for query optimization.

Parameters
$term : Term
Return values
bool

isDeleted()

Checks, that document is deleted

public isDeleted(int $id) : bool
Parameters
$id : int
Tags
throws
OutOfRangeException

is thrown if $id is out of the range

Return values
bool

maxDoc()

Returns one greater than the largest possible document number.

public maxDoc() : int

This may be used to, e.g., determine how big to allocate a structure which will have an element for every document number in an index.

Return values
int

nextTerm()

Scans terms dictionary and returns next term

public nextTerm() : Term|null
Return values
Term|null

norm()

Returns a normalization factor for "field, document" pair.

public norm(int $id, string $fieldName) : float
Parameters
$id : int
$fieldName : string
Return values
float

numDocs()

Returns the total number of non-deleted documents in this index.

public numDocs() : int
Return values
int

optimize()

Optimize index.

public optimize() : void

Merges all segments into one

resetTermsStream()

Reset terms stream.

public resetTermsStream() : void

setFormatVersion()

Set index format version.

public setFormatVersion(int $formatVersion) : void

Index is converted to this format at the nearest upfdate time

Parameters
$formatVersion : int
Tags
throws
InvalidArgumentException

setMaxBufferedDocs()

Set index maxBufferedDocs option

public setMaxBufferedDocs(int $maxBufferedDocs) : void

maxBufferedDocs is a minimal number of documents required before the buffered in-memory documents are written into a new Segment

Default value is 10

Parameters
$maxBufferedDocs : int

setMaxMergeDocs()

Set index maxMergeDocs option

public setMaxMergeDocs(int $maxMergeDocs) : void

maxMergeDocs is a largest number of documents ever merged by addDocument(). Small values (e.g., less than 10,000) are best for interactive indexing, as this limits the length of pauses while indexing to a few seconds. Larger values are best for batched indexing and speedier searches.

Default value is PHP_INT_MAX

Parameters
$maxMergeDocs : int

setMergeFactor()

Set index mergeFactor option

public setMergeFactor(mixed $mergeFactor) : void

mergeFactor determines how often segment indices are merged by addDocument(). With smaller values, less RAM is used while indexing, and searches on unoptimized indices are faster, but indexing speed is slower. With larger values, more RAM is used during indexing, and while searches on unoptimized indices are slower, indexing is faster. Thus larger values (> 10) are best for batch index creation, and smaller values (< 10) for indices that are interactively maintained.

Default value is 10

Parameters
$mergeFactor : mixed

skipTo()

Skip terms stream up to specified term preffix.

public skipTo(Term $prefix) : void

Prefix contains fully specified field info and portion of searched term

Parameters
$prefix : Term

termDocs()

Returns IDs of all documents containing term.

public termDocs(Term $term[, DocsFilter|null $docsFilter = null ]) : array<string|int, mixed>
Parameters
$term : Term
$docsFilter : DocsFilter|null = null
Return values
array<string|int, mixed>

termDocsFilter()

Returns documents filter for all documents containing term.

public termDocsFilter(Term $term[, DocsFilter|null $docsFilter = null ]) : DocsFilter

It performs the same operation as termDocs, but return result as Zend_Search_Lucene_Index_DocsFilter object

Parameters
$term : Term
$docsFilter : DocsFilter|null = null
Return values
DocsFilter

termFreqs()

Returns an array of all term freqs.

public termFreqs(Term $term[, DocsFilter|null $docsFilter = null ]) : int

Result array structure: array(docId => freq, ...)

Parameters
$term : Term
$docsFilter : DocsFilter|null = null
Return values
int

termPositions()

Returns an array of all term positions in the documents.

public termPositions(Term $term[, DocsFilter|null $docsFilter = null ]) : array<string|int, mixed>

Result array structure: array(docId => array(pos1, pos2, ...), ...)

Parameters
$term : Term
$docsFilter : DocsFilter|null = null
Return values
array<string|int, mixed>

terms()

Returns an array of all terms in this index.

public terms() : array<string|int, mixed>
Return values
array<string|int, mixed>

undeleteAll()

Undeletes all documents currently marked as deleted in this index.

public undeleteAll() : void
Tags
todo

Implementation

_getIndexWriter()

Returns an instance of Zend_Search_Lucene_Index_Writer for the index

private _getIndexWriter() : Writer
Return values
Writer

_updateDocCount()

Update document counter

private _updateDocCount() : void

        
On this page

Search results