SegmentInfo
in package
implements
TermsStreamInterface
Tags
Table of Contents
Interfaces
Constants
- FULL_SCAN_VS_FETCH_BOUNDARY = 5
- "Full scan vs fetch" boundary.
- SM_FULL_INFO = 1
- SM_MERGE_INFO = 2
- SM_TERMS_ONLY = 0
- Scan modes
Properties
- $_deleted : mixed
- List of deleted documents.
- $_deletedDirty : bool
- $this->_deleted update flag
- $_delGen : int
- Delete file generation number
- $_directory : DirectoryInterface
- File system adapter.
- $_docCount : int
- Number of docs in a segment
- $_docMap : array<string|int, mixed>|null
- Map of the document IDs Used to get new docID after removing deleted documents.
- $_fields : array<string|int, mixed>
- Segment fields. Array of Zend_Search_Lucene_Index_FieldInfo objects for this segment
- $_fieldsDicPositions : array<string|int, mixed>
- Field positions in a dictionary.
- $_frqFile : FileInterface
- Frequencies File object for stream like terms reading
- $_frqFileOffset : int
- Actual offset of the .frq file data
- $_hasSingleNormFile : bool
- Segment has single norms file
- $_indexInterval : int
- Segment index interval
- $_isCompound : bool
- Use compound segment file (*.cfs) to collect all other segment files (excluding .del files)
- $_lastTerm : Term
- Last Term in a terms stream
- $_lastTermInfo : TermInfo
- Last TermInfo in a terms stream
- $_lastTermPositions : array<string|int, mixed>|null
- An array of all term positions in the documents.
- $_name : string
- Segment name
- $_norms : array<string|int, mixed>
- Normalization factors.
- $_prxFile : FileInterface
- Positions File object for stream like terms reading
- $_prxFileOffset : int
- Actual offset of the .prx file in the compound file
- $_segFiles : array<string|int, mixed>
- Associative array where the key is the file name and the value is data offset in a compound segment file (.csf).
- $_segFileSizes : array<string|int, mixed>
- Associative array where the key is the file name and the value is file size (.csf).
- $_sharedDocStoreOptions : mixed
- $_skipInterval : int
- Segment skip interval
- $_termCount : int
- Actual number of terms in term stream
- $_termDictionary : array<string|int, mixed>
- Term Dictionary Index
- $_termDictionaryInfos : array<string|int, mixed>
- Term Dictionary Index TermInfos
- $_termInfoCache : array<string|int, mixed>
- TermInfo cache
- $_termNum : int
- Overall number of terms in term stream
- $_termsScanMode : int
- Terms scan mode
- $_tisFile : FileInterface
- Term Dictionary File object for stream like terms reading
- $_tisFileOffset : int
- Actual offset of the .tis file data
- $_usesSharedDocStore : bool
- True if segment uses shared doc store
Methods
- __construct() : mixed
- Zend_Search_Lucene_Index_SegmentInfo constructor
- closeTermsStream() : void
- Close terms stream
- compoundFileLength() : int
- Get compound file length
- count() : int
- Returns the total number of documents in this segment (including deleted documents).
- currentTerm() : Term|null
- Returns term in current position
- currentTermPositions() : array<string|int, mixed>
- Returns an array of all term positions in the documents.
- delete() : void
- Deletes a document from the index segment.
- getDelGen() : int
- Returns actual deletions file generation number.
- getField() : FieldInfo
- Returns field info for specified field
- getFieldInfos() : array<string|int, mixed>
- Returns array of FieldInfo objects.
- getFieldNum() : int
- Returns field index or -1 if field is not found
- getFields() : array<string|int, mixed>
- Returns array of fields.
- getName() : string
- Return segment name
- getTermInfo() : TermInfo
- Scans terms dictionary and returns term info
- hasDeletions() : bool
- Returns true if any documents have been deleted from this index segment.
- hasSingleNormFile() : bool
- Returns true if segment has single norms file.
- isCompound() : bool
- Returns true if segment is stored using compound segment file.
- isDeleted() : bool
- Checks, that document is deleted
- nextTerm() : Term|null
- Scans terms dictionary and returns next term
- norm() : float
- Returns normalization factor for specified documents
- normVector() : string
- Returns norm vector, encoded in a byte string
- numDocs() : int
- Returns the total number of non-deleted documents in this segment.
- openCompoundFile() : FileInterface
- Opens index file stoted within compound index file
- resetTermsStream() : int
- Reset terms stream
- skipTo() : void
- Skip terms stream up to specified term preffix.
- termDocs() : array<string|int, mixed>
- Returns IDs of all the documents containing term.
- termFreqs() : TermInfo
- Returns term freqs array.
- termPositions() : TermInfo
- Returns term positions array.
- _cleanUpTermInfoCache() : void
- _deletedCount() : int
- Returns number of deleted documents.
- _detectLatestDelGen() : int
- Detect latest delete generation
- _getFieldPosition() : int
- Get field position in a fields dictionary
- _load21DelFile() : mixed
- Load 2.1+ format detetions file
- _loadDelFile() : mixed
- Load detetions file
- _loadDictionaryIndex() : void
- Load terms dictionary index
- _loadNorm() : void
- Load normalizatin factors from an index file
- _loadPre21DelFile() : mixed
- Load pre-2.1 detetions file
Constants
FULL_SCAN_VS_FETCH_BOUNDARY
"Full scan vs fetch" boundary.
public
mixed
FULL_SCAN_VS_FETCH_BOUNDARY
= 5
If filter selectivity is less than this value, then full scan is performed (since term entries fetching has some additional overhead).
SM_FULL_INFO
public
mixed
SM_FULL_INFO
= 1
SM_MERGE_INFO
public
mixed
SM_MERGE_INFO
= 2
SM_TERMS_ONLY
Scan modes
public
mixed
SM_TERMS_ONLY
= 0
Properties
$_deleted
List of deleted documents.
private
mixed
$_deleted
= null
bitset if bitset extension is loaded or array otherwise.
$_deletedDirty
$this->_deleted update flag
private
bool
$_deletedDirty
= false
$_delGen
Delete file generation number
private
int
$_delGen
-2 means autodetect latest delete generation -1 means 'there is no delete file' 0 means pre-2.1 format delete file X specifies used delete file
$_directory
File system adapter.
private
DirectoryInterface
$_directory
$_docCount
Number of docs in a segment
private
int
$_docCount
$_docMap
Map of the document IDs Used to get new docID after removing deleted documents.
private
array<string|int, mixed>|null
$_docMap
= null
It's not very effective from memory usage point of view, but much more faster, then other methods
$_fields
Segment fields. Array of Zend_Search_Lucene_Index_FieldInfo objects for this segment
private
array<string|int, mixed>
$_fields
$_fieldsDicPositions
Field positions in a dictionary.
private
array<string|int, mixed>
$_fieldsDicPositions
(Term dictionary contains filelds ordered by names)
$_frqFile
Frequencies File object for stream like terms reading
private
FileInterface
$_frqFile
= null
$_frqFileOffset
Actual offset of the .frq file data
private
int
$_frqFileOffset
$_hasSingleNormFile
Segment has single norms file
private
bool
$_hasSingleNormFile
If true then one .nrm file is used for all fields Otherwise .fN files are used
$_indexInterval
Segment index interval
private
int
$_indexInterval
$_isCompound
Use compound segment file (*.cfs) to collect all other segment files (excluding .del files)
private
bool
$_isCompound
$_lastTerm
Last Term in a terms stream
private
Term
$_lastTerm
= null
$_lastTermInfo
Last TermInfo in a terms stream
private
TermInfo
$_lastTermInfo
= null
$_lastTermPositions
An array of all term positions in the documents.
private
array<string|int, mixed>|null
$_lastTermPositions
Array structure: array( docId => array( pos1, pos2, ...), ...)
Is set to null if term positions loading has to be skipped
$_name
Segment name
private
string
$_name
$_norms
Normalization factors.
private
array<string|int, mixed>
$_norms
= array()
An array fieldName => normVector normVector is a binary string. Each byte corresponds to an indexed document in a segment and encodes normalization factor (float value, encoded by \ZendSearch\Lucene\Search\Similarity\AbstractSimilarity::encodeNorm())
$_prxFile
Positions File object for stream like terms reading
private
FileInterface
$_prxFile
= null
$_prxFileOffset
Actual offset of the .prx file in the compound file
private
int
$_prxFileOffset
$_segFiles
Associative array where the key is the file name and the value is data offset in a compound segment file (.csf).
private
array<string|int, mixed>
$_segFiles
$_segFileSizes
Associative array where the key is the file name and the value is file size (.csf).
private
array<string|int, mixed>
$_segFileSizes
$_sharedDocStoreOptions
private
mixed
$_sharedDocStoreOptions
$_skipInterval
Segment skip interval
private
int
$_skipInterval
$_termCount
Actual number of terms in term stream
private
int
$_termCount
= 0
$_termDictionary
Term Dictionary Index
private
array<string|int, mixed>
$_termDictionary
Array of arrays (Zend_Search_Lucene_Index_Term objects are represented as arrays because of performance considerations) [0] -> $termValue [1] -> $termFieldNum
Corresponding Zend_Search_Lucene_Index_TermInfo object stored in the $_termDictionaryInfos
$_termDictionaryInfos
Term Dictionary Index TermInfos
private
array<string|int, mixed>
$_termDictionaryInfos
Array of arrays (Zend_Search_Lucene_Index_TermInfo objects are represented as arrays because of performance considerations) [0] -> $docFreq [1] -> $freqPointer [2] -> $proxPointer [3] -> $skipOffset [4] -> $indexPointer
$_termInfoCache
TermInfo cache
private
array<string|int, mixed>
$_termInfoCache
= array()
Size is 1024. Numbers are used instead of class constants because of performance considerations
$_termNum
Overall number of terms in term stream
private
int
$_termNum
= 0
$_termsScanMode
Terms scan mode
private
int
$_termsScanMode
Values:
self::SM_TERMS_ONLY - terms are scanned, no additional info is retrieved self::SM_FULL_INFO - terms are scanned, frequency and position info is retrieved self::SM_MERGE_INFO - terms are scanned, frequency and position info is retrieved document numbers are compacted (shifted if segment has deleted documents)
$_tisFile
Term Dictionary File object for stream like terms reading
private
FileInterface
$_tisFile
= null
$_tisFileOffset
Actual offset of the .tis file data
private
int
$_tisFileOffset
$_usesSharedDocStore
True if segment uses shared doc store
private
bool
$_usesSharedDocStore
Methods
__construct()
Zend_Search_Lucene_Index_SegmentInfo constructor
public
__construct(DirectoryInterface $directory, string $name, int $docCount[, int $delGen = 0 ][, array<string|int, mixed>|null $docStoreOptions = null ][, bool $hasSingleNormFile = false ][, bool $isCompound = null ]) : mixed
Parameters
- $directory : DirectoryInterface
- $name : string
- $docCount : int
- $delGen : int = 0
- $docStoreOptions : array<string|int, mixed>|null = null
- $hasSingleNormFile : bool = false
- $isCompound : bool = null
Tags
closeTermsStream()
Close terms stream
public
closeTermsStream() : void
Should be used for resources clean up if stream is not read up to the end
compoundFileLength()
Get compound file length
public
compoundFileLength(string $extension) : int
Parameters
- $extension : string
Tags
Return values
intcount()
Returns the total number of documents in this segment (including deleted documents).
public
count() : int
Return values
intcurrentTerm()
Returns term in current position
public
currentTerm() : Term|null
Return values
Term|nullcurrentTermPositions()
Returns an array of all term positions in the documents.
public
currentTermPositions() : array<string|int, mixed>
Return array structure: array( docId => array( pos1, pos2, ...), ...)
Return values
array<string|int, mixed>delete()
Deletes a document from the index segment.
public
delete(mixed $id) : void
$id is an internal document id
Parameters
- $id : mixed
getDelGen()
Returns actual deletions file generation number.
public
getDelGen() : int
Return values
intgetField()
Returns field info for specified field
public
getField(int $fieldNum) : FieldInfo
Parameters
- $fieldNum : int
Return values
FieldInfogetFieldInfos()
Returns array of FieldInfo objects.
public
getFieldInfos() : array<string|int, mixed>
Return values
array<string|int, mixed>getFieldNum()
Returns field index or -1 if field is not found
public
getFieldNum(string $fieldName) : int
Parameters
- $fieldName : string
Return values
intgetFields()
Returns array of fields.
public
getFields([bool $indexed = false ]) : array<string|int, mixed>
if $indexed parameter is true, then returns only indexed fields.
Parameters
- $indexed : bool = false
Return values
array<string|int, mixed>getName()
Return segment name
public
getName() : string
Return values
stringgetTermInfo()
Scans terms dictionary and returns term info
public
getTermInfo(Term $term) : TermInfo
Parameters
- $term : Term
Tags
Return values
TermInfohasDeletions()
Returns true if any documents have been deleted from this index segment.
public
hasDeletions() : bool
Return values
boolhasSingleNormFile()
Returns true if segment has single norms file.
public
hasSingleNormFile() : bool
Return values
boolisCompound()
Returns true if segment is stored using compound segment file.
public
isCompound() : bool
Return values
boolisDeleted()
Checks, that document is deleted
public
isDeleted(int $id) : bool
Parameters
- $id : int
Return values
boolnextTerm()
Scans terms dictionary and returns next term
public
nextTerm() : Term|null
Return values
Term|nullnorm()
Returns normalization factor for specified documents
public
norm(int $id, string $fieldName) : float
Parameters
- $id : int
- $fieldName : string
Return values
floatnormVector()
Returns norm vector, encoded in a byte string
public
normVector(string $fieldName) : string
Parameters
- $fieldName : string
Return values
stringnumDocs()
Returns the total number of non-deleted documents in this segment.
public
numDocs() : int
Return values
intopenCompoundFile()
Opens index file stoted within compound index file
public
openCompoundFile(string $extension[, bool $shareHandler = true ]) : FileInterface
Parameters
- $extension : string
- $shareHandler : bool = true
Tags
Return values
FileInterfaceresetTermsStream()
Reset terms stream
public
resetTermsStream() : int
$startId - id for the fist document $compact - remove deleted documents
Returns start document id for the next segment
Tags
Return values
intskipTo()
Skip terms stream up to specified term preffix.
public
skipTo(Term $prefix) : void
Prefix contains fully specified field info and portion of searched term
Parameters
- $prefix : Term
termDocs()
Returns IDs of all the documents containing term.
public
termDocs(Term $term[, int $shift = 0 ][, DocsFilter|null $docsFilter = null ]) : array<string|int, mixed>
Parameters
- $term : Term
- $shift : int = 0
- $docsFilter : DocsFilter|null = null
Tags
Return values
array<string|int, mixed>termFreqs()
Returns term freqs array.
public
termFreqs(Term $term[, int $shift = 0 ][, DocsFilter|null $docsFilter = null ]) : TermInfo
Result array structure: array(docId => freq, ...)
Parameters
- $term : Term
- $shift : int = 0
- $docsFilter : DocsFilter|null = null
Return values
TermInfotermPositions()
Returns term positions array.
public
termPositions(Term $term[, int $shift = 0 ][, DocsFilter|null $docsFilter = null ]) : TermInfo
Result array structure: array(docId => array(pos1, pos2, ...), ...)
Parameters
- $term : Term
- $shift : int = 0
- $docsFilter : DocsFilter|null = null
Return values
TermInfo_cleanUpTermInfoCache()
private
_cleanUpTermInfoCache() : void
_deletedCount()
Returns number of deleted documents.
private
_deletedCount() : int
Return values
int_detectLatestDelGen()
Detect latest delete generation
private
_detectLatestDelGen() : int
Is actualy used from writeChanges() method or from the constructor if it's invoked from Index writer. In both cases index write lock is already obtained, so we shouldn't care about it
Return values
int_getFieldPosition()
Get field position in a fields dictionary
private
_getFieldPosition(int $fieldNum) : int
Parameters
- $fieldNum : int
Return values
int_load21DelFile()
Load 2.1+ format detetions file
private
_load21DelFile() : mixed
Returns bitset or an array depending on bitset extension availability
_loadDelFile()
Load detetions file
private
_loadDelFile() : mixed
Returns bitset or an array depending on bitset extension availability
_loadDictionaryIndex()
Load terms dictionary index
private
_loadDictionaryIndex() : void
Tags
_loadNorm()
Load normalizatin factors from an index file
private
_loadNorm(int $fieldNum) : void
Parameters
- $fieldNum : int
Tags
_loadPre21DelFile()
Load pre-2.1 detetions file
private
_loadPre21DelFile() : mixed
Returns bitset or an array depending on bitset extension availability