HumHub Documentation (unofficial)

SegmentInfo
in package
implements TermsStreamInterface

Tags
category

Zend

subpackage

Index

Table of Contents

Interfaces

TermsStreamInterface

Constants

FULL_SCAN_VS_FETCH_BOUNDARY  = 5
"Full scan vs fetch" boundary.
SM_FULL_INFO  = 1
SM_MERGE_INFO  = 2
SM_TERMS_ONLY  = 0
Scan modes

Properties

$_deleted  : mixed
List of deleted documents.
$_deletedDirty  : bool
$this->_deleted update flag
$_delGen  : int
Delete file generation number
$_directory  : DirectoryInterface
File system adapter.
$_docCount  : int
Number of docs in a segment
$_docMap  : array<string|int, mixed>|null
Map of the document IDs Used to get new docID after removing deleted documents.
$_fields  : array<string|int, mixed>
Segment fields. Array of Zend_Search_Lucene_Index_FieldInfo objects for this segment
$_fieldsDicPositions  : array<string|int, mixed>
Field positions in a dictionary.
$_frqFile  : FileInterface
Frequencies File object for stream like terms reading
$_frqFileOffset  : int
Actual offset of the .frq file data
$_hasSingleNormFile  : bool
Segment has single norms file
$_indexInterval  : int
Segment index interval
$_isCompound  : bool
Use compound segment file (*.cfs) to collect all other segment files (excluding .del files)
$_lastTerm  : Term
Last Term in a terms stream
$_lastTermInfo  : TermInfo
Last TermInfo in a terms stream
$_lastTermPositions  : array<string|int, mixed>|null
An array of all term positions in the documents.
$_name  : string
Segment name
$_norms  : array<string|int, mixed>
Normalization factors.
$_prxFile  : FileInterface
Positions File object for stream like terms reading
$_prxFileOffset  : int
Actual offset of the .prx file in the compound file
$_segFiles  : array<string|int, mixed>
Associative array where the key is the file name and the value is data offset in a compound segment file (.csf).
$_segFileSizes  : array<string|int, mixed>
Associative array where the key is the file name and the value is file size (.csf).
$_sharedDocStoreOptions  : mixed
$_skipInterval  : int
Segment skip interval
$_termCount  : int
Actual number of terms in term stream
$_termDictionary  : array<string|int, mixed>
Term Dictionary Index
$_termDictionaryInfos  : array<string|int, mixed>
Term Dictionary Index TermInfos
$_termInfoCache  : array<string|int, mixed>
TermInfo cache
$_termNum  : int
Overall number of terms in term stream
$_termsScanMode  : int
Terms scan mode
$_tisFile  : FileInterface
Term Dictionary File object for stream like terms reading
$_tisFileOffset  : int
Actual offset of the .tis file data
$_usesSharedDocStore  : bool
True if segment uses shared doc store

Methods

__construct()  : mixed
Zend_Search_Lucene_Index_SegmentInfo constructor
closeTermsStream()  : void
Close terms stream
compoundFileLength()  : int
Get compound file length
count()  : int
Returns the total number of documents in this segment (including deleted documents).
currentTerm()  : Term|null
Returns term in current position
currentTermPositions()  : array<string|int, mixed>
Returns an array of all term positions in the documents.
delete()  : void
Deletes a document from the index segment.
getDelGen()  : int
Returns actual deletions file generation number.
getField()  : FieldInfo
Returns field info for specified field
getFieldInfos()  : array<string|int, mixed>
Returns array of FieldInfo objects.
getFieldNum()  : int
Returns field index or -1 if field is not found
getFields()  : array<string|int, mixed>
Returns array of fields.
getName()  : string
Return segment name
getTermInfo()  : TermInfo
Scans terms dictionary and returns term info
hasDeletions()  : bool
Returns true if any documents have been deleted from this index segment.
hasSingleNormFile()  : bool
Returns true if segment has single norms file.
isCompound()  : bool
Returns true if segment is stored using compound segment file.
isDeleted()  : bool
Checks, that document is deleted
nextTerm()  : Term|null
Scans terms dictionary and returns next term
norm()  : float
Returns normalization factor for specified documents
normVector()  : string
Returns norm vector, encoded in a byte string
numDocs()  : int
Returns the total number of non-deleted documents in this segment.
openCompoundFile()  : FileInterface
Opens index file stoted within compound index file
resetTermsStream()  : int
Reset terms stream
skipTo()  : void
Skip terms stream up to specified term preffix.
termDocs()  : array<string|int, mixed>
Returns IDs of all the documents containing term.
termFreqs()  : TermInfo
Returns term freqs array.
termPositions()  : TermInfo
Returns term positions array.
_cleanUpTermInfoCache()  : void
_deletedCount()  : int
Returns number of deleted documents.
_detectLatestDelGen()  : int
Detect latest delete generation
_getFieldPosition()  : int
Get field position in a fields dictionary
_load21DelFile()  : mixed
Load 2.1+ format detetions file
_loadDelFile()  : mixed
Load detetions file
_loadDictionaryIndex()  : void
Load terms dictionary index
_loadNorm()  : void
Load normalizatin factors from an index file
_loadPre21DelFile()  : mixed
Load pre-2.1 detetions file

Constants

FULL_SCAN_VS_FETCH_BOUNDARY

"Full scan vs fetch" boundary.

public mixed FULL_SCAN_VS_FETCH_BOUNDARY = 5

If filter selectivity is less than this value, then full scan is performed (since term entries fetching has some additional overhead).

SM_TERMS_ONLY

Scan modes

public mixed SM_TERMS_ONLY = 0

Properties

$_deleted

List of deleted documents.

private mixed $_deleted = null

bitset if bitset extension is loaded or array otherwise.

$_deletedDirty

$this->_deleted update flag

private bool $_deletedDirty = false

$_delGen

Delete file generation number

private int $_delGen

-2 means autodetect latest delete generation -1 means 'there is no delete file' 0 means pre-2.1 format delete file X specifies used delete file

$_docCount

Number of docs in a segment

private int $_docCount

$_docMap

Map of the document IDs Used to get new docID after removing deleted documents.

private array<string|int, mixed>|null $_docMap = null

It's not very effective from memory usage point of view, but much more faster, then other methods

$_fields

Segment fields. Array of Zend_Search_Lucene_Index_FieldInfo objects for this segment

private array<string|int, mixed> $_fields

$_fieldsDicPositions

Field positions in a dictionary.

private array<string|int, mixed> $_fieldsDicPositions

(Term dictionary contains filelds ordered by names)

$_frqFileOffset

Actual offset of the .frq file data

private int $_frqFileOffset

$_hasSingleNormFile

Segment has single norms file

private bool $_hasSingleNormFile

If true then one .nrm file is used for all fields Otherwise .fN files are used

$_indexInterval

Segment index interval

private int $_indexInterval

$_isCompound

Use compound segment file (*.cfs) to collect all other segment files (excluding .del files)

private bool $_isCompound

$_lastTermPositions

An array of all term positions in the documents.

private array<string|int, mixed>|null $_lastTermPositions

Array structure: array( docId => array( pos1, pos2, ...), ...)

Is set to null if term positions loading has to be skipped

$_norms

Normalization factors.

private array<string|int, mixed> $_norms = array()

An array fieldName => normVector normVector is a binary string. Each byte corresponds to an indexed document in a segment and encodes normalization factor (float value, encoded by \ZendSearch\Lucene\Search\Similarity\AbstractSimilarity::encodeNorm())

$_prxFileOffset

Actual offset of the .prx file in the compound file

private int $_prxFileOffset

$_segFiles

Associative array where the key is the file name and the value is data offset in a compound segment file (.csf).

private array<string|int, mixed> $_segFiles

$_segFileSizes

Associative array where the key is the file name and the value is file size (.csf).

private array<string|int, mixed> $_segFileSizes

$_sharedDocStoreOptions

private mixed $_sharedDocStoreOptions

$_skipInterval

Segment skip interval

private int $_skipInterval

$_termCount

Actual number of terms in term stream

private int $_termCount = 0

$_termDictionary

Term Dictionary Index

private array<string|int, mixed> $_termDictionary

Array of arrays (Zend_Search_Lucene_Index_Term objects are represented as arrays because of performance considerations) [0] -> $termValue [1] -> $termFieldNum

Corresponding Zend_Search_Lucene_Index_TermInfo object stored in the $_termDictionaryInfos

$_termDictionaryInfos

Term Dictionary Index TermInfos

private array<string|int, mixed> $_termDictionaryInfos

Array of arrays (Zend_Search_Lucene_Index_TermInfo objects are represented as arrays because of performance considerations) [0] -> $docFreq [1] -> $freqPointer [2] -> $proxPointer [3] -> $skipOffset [4] -> $indexPointer

$_termInfoCache

TermInfo cache

private array<string|int, mixed> $_termInfoCache = array()

Size is 1024. Numbers are used instead of class constants because of performance considerations

$_termNum

Overall number of terms in term stream

private int $_termNum = 0

$_termsScanMode

Terms scan mode

private int $_termsScanMode

Values:

self::SM_TERMS_ONLY - terms are scanned, no additional info is retrieved self::SM_FULL_INFO - terms are scanned, frequency and position info is retrieved self::SM_MERGE_INFO - terms are scanned, frequency and position info is retrieved document numbers are compacted (shifted if segment has deleted documents)

$_tisFileOffset

Actual offset of the .tis file data

private int $_tisFileOffset

$_usesSharedDocStore

True if segment uses shared doc store

private bool $_usesSharedDocStore

Methods

__construct()

Zend_Search_Lucene_Index_SegmentInfo constructor

public __construct(DirectoryInterface $directory, string $name, int $docCount[, int $delGen = 0 ][, array<string|int, mixed>|null $docStoreOptions = null ][, bool $hasSingleNormFile = false ][, bool $isCompound = null ]) : mixed
Parameters
$directory : DirectoryInterface
$name : string
$docCount : int
$delGen : int = 0
$docStoreOptions : array<string|int, mixed>|null = null
$hasSingleNormFile : bool = false
$isCompound : bool = null
Tags
throws
RuntimeException

closeTermsStream()

Close terms stream

public closeTermsStream() : void

Should be used for resources clean up if stream is not read up to the end

compoundFileLength()

Get compound file length

public compoundFileLength(string $extension) : int
Parameters
$extension : string
Tags
throws
InvalidFileFormatException
Return values
int

count()

Returns the total number of documents in this segment (including deleted documents).

public count() : int
Return values
int

currentTerm()

Returns term in current position

public currentTerm() : Term|null
Return values
Term|null

currentTermPositions()

Returns an array of all term positions in the documents.

public currentTermPositions() : array<string|int, mixed>

Return array structure: array( docId => array( pos1, pos2, ...), ...)

Return values
array<string|int, mixed>

delete()

Deletes a document from the index segment.

public delete(mixed $id) : void

$id is an internal document id

Parameters
$id : mixed

getDelGen()

Returns actual deletions file generation number.

public getDelGen() : int
Return values
int

getField()

Returns field info for specified field

public getField(int $fieldNum) : FieldInfo
Parameters
$fieldNum : int
Return values
FieldInfo

getFieldInfos()

Returns array of FieldInfo objects.

public getFieldInfos() : array<string|int, mixed>
Return values
array<string|int, mixed>

getFieldNum()

Returns field index or -1 if field is not found

public getFieldNum(string $fieldName) : int
Parameters
$fieldName : string
Return values
int

getFields()

Returns array of fields.

public getFields([bool $indexed = false ]) : array<string|int, mixed>

if $indexed parameter is true, then returns only indexed fields.

Parameters
$indexed : bool = false
Return values
array<string|int, mixed>

getName()

Return segment name

public getName() : string
Return values
string

hasDeletions()

Returns true if any documents have been deleted from this index segment.

public hasDeletions() : bool
Return values
bool

hasSingleNormFile()

Returns true if segment has single norms file.

public hasSingleNormFile() : bool
Return values
bool

isCompound()

Returns true if segment is stored using compound segment file.

public isCompound() : bool
Return values
bool

isDeleted()

Checks, that document is deleted

public isDeleted(int $id) : bool
Parameters
$id : int
Return values
bool

nextTerm()

Scans terms dictionary and returns next term

public nextTerm() : Term|null
Return values
Term|null

norm()

Returns normalization factor for specified documents

public norm(int $id, string $fieldName) : float
Parameters
$id : int
$fieldName : string
Return values
float

normVector()

Returns norm vector, encoded in a byte string

public normVector(string $fieldName) : string
Parameters
$fieldName : string
Return values
string

numDocs()

Returns the total number of non-deleted documents in this segment.

public numDocs() : int
Return values
int

skipTo()

Skip terms stream up to specified term preffix.

public skipTo(Term $prefix) : void

Prefix contains fully specified field info and portion of searched term

Parameters
$prefix : Term

termDocs()

Returns IDs of all the documents containing term.

public termDocs(Term $term[, int $shift = 0 ][, DocsFilter|null $docsFilter = null ]) : array<string|int, mixed>
Parameters
$term : Term
$shift : int = 0
$docsFilter : DocsFilter|null = null
Tags
throws
InvalidArgumentException
Return values
array<string|int, mixed>

termFreqs()

Returns term freqs array.

public termFreqs(Term $term[, int $shift = 0 ][, DocsFilter|null $docsFilter = null ]) : TermInfo

Result array structure: array(docId => freq, ...)

Parameters
$term : Term
$shift : int = 0
$docsFilter : DocsFilter|null = null
Return values
TermInfo

termPositions()

Returns term positions array.

public termPositions(Term $term[, int $shift = 0 ][, DocsFilter|null $docsFilter = null ]) : TermInfo

Result array structure: array(docId => array(pos1, pos2, ...), ...)

Parameters
$term : Term
$shift : int = 0
$docsFilter : DocsFilter|null = null
Return values
TermInfo

_cleanUpTermInfoCache()

private _cleanUpTermInfoCache() : void

_deletedCount()

Returns number of deleted documents.

private _deletedCount() : int
Return values
int

_detectLatestDelGen()

Detect latest delete generation

private _detectLatestDelGen() : int

Is actualy used from writeChanges() method or from the constructor if it's invoked from Index writer. In both cases index write lock is already obtained, so we shouldn't care about it

Return values
int

_getFieldPosition()

Get field position in a fields dictionary

private _getFieldPosition(int $fieldNum) : int
Parameters
$fieldNum : int
Return values
int

_load21DelFile()

Load 2.1+ format detetions file

private _load21DelFile() : mixed

Returns bitset or an array depending on bitset extension availability

_loadDelFile()

Load detetions file

private _loadDelFile() : mixed

Returns bitset or an array depending on bitset extension availability

_loadPre21DelFile()

Load pre-2.1 detetions file

private _loadPre21DelFile() : mixed

Returns bitset or an array depending on bitset extension availability

Tags
throws
RuntimeException

        
On this page

Search results