Diff_SequenceMatcher
in package
Sequence matcher for Diff
PHP version 5
Copyright (c) 2009 Chris Boulton chris.boulton@interspire.com
All rights reserved.
Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
- Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
- Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
- Neither the name of the Chris Boulton nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
Tags
Table of Contents
Properties
- $a : array<string|int, mixed>
- $b : array<string|int, mixed>
- $b2j : array<string|int, mixed>
- $defaultOptions : mixed
- $fullBCount : mixed
- $junkCallback : string|array<string|int, mixed>
- $junkDict : array<string|int, mixed>
- $matchingBlocks : mixed
- $opCodes : mixed
- $options : mixed
Methods
- __construct() : mixed
- The constructor. With the sequences being passed, they'll be set for the sequence matcher and it will perform a basic cleanup & calculate junk elements.
- findLongestMatch() : array<string|int, mixed>
- Find the longest matching block in the two sequences, as defined by the lower and upper constraints for each sequence. (for the first sequence, $alo - $ahi and for the second sequence, $blo - $bhi)
- getGroupedOpcodes() : array<string|int, mixed>
- Return a series of nested arrays containing different groups of generated opcodes for the differences between the strings with up to $context lines of surrounding content.
- getMatchingBlocks() : array<string|int, mixed>
- Return a nested set of arrays for all of the matching sub-sequences in the strings $a and $b.
- getOpCodes() : array<string|int, mixed>
- Return a list of all of the opcodes for the differences between the two strings.
- linesAreDifferent() : bool
- Check if the two lines at the given indexes are different or not.
- Ratio() : float
- Return a measure of the similarity between the two sequences.
- setOptions() : mixed
- Set new options
- setSeq1() : mixed
- Set the first sequence ($a) and reset any internal caches to indicate that when calling the calculation methods, we need to recalculate them.
- setSeq2() : mixed
- Set the second sequence ($b) and reset any internal caches to indicate that when calling the calculation methods, we need to recalculate them.
- setSequences() : mixed
- Set the first and second sequences to use with the sequence matcher.
- arrayGetDefault() : mixed
- Helper function that provides the ability to return the value for a key in an array of it exists, or if it doesn't then return a default value.
- calculateRatio() : float
- Helper function for calculating the ratio to measure similarity for the strings.
- chainB() : mixed
- Generate the internal arrays containing the list of junk and non-junk characters for the second ($b) sequence.
- isBJunk() : bool
- Checks if a particular character is in the junk dictionary for the list of junk characters.
- quickRatio() : float
- Quickly return an upper bound ratio for the similarity of the strings.
- ratioReduce() : int
- Helper function to calculate the number of matches for Ratio().
- realquickRatio() : float
- Return an upper bound ratio really quickly for the similarity of the strings.
- tupleSort() : int
- Sort an array by the nested arrays it contains. Helper function for getMatchingBlocks
Properties
$a
private
array<string|int, mixed>
$a
= \null
The first sequence to compare against.
$b
private
array<string|int, mixed>
$b
= \null
The second sequence.
$b2j
private
array<string|int, mixed>
$b2j
= array()
Array of indices that do not contain junk elements.
$defaultOptions
private
mixed
$defaultOptions
= array('ignoreNewLines' => \false, 'ignoreWhitespace' => \false, 'ignoreCase' => \false)
$fullBCount
private
mixed
$fullBCount
= \null
$junkCallback
private
string|array<string|int, mixed>
$junkCallback
= \null
Either a string or an array containing a callback function to determine if a line is "junk" or not.
$junkDict
private
array<string|int, mixed>
$junkDict
= array()
Array of characters that are considered junk from the second sequence. Characters are the array key.
$matchingBlocks
private
mixed
$matchingBlocks
= \null
$opCodes
private
mixed
$opCodes
= \null
$options
private
mixed
$options
= array()
Methods
__construct()
The constructor. With the sequences being passed, they'll be set for the sequence matcher and it will perform a basic cleanup & calculate junk elements.
public
__construct(string|array<string|int, mixed> $a, string|array<string|int, mixed> $b[, string|array<string|int, mixed> $junkCallback = null ][, array<string|int, mixed> $options = [] ]) : mixed
Parameters
- $a : string|array<string|int, mixed>
-
A string or array containing the lines to compare against.
- $b : string|array<string|int, mixed>
-
A string or array containing the lines to compare.
- $junkCallback : string|array<string|int, mixed> = null
-
Either an array or string that references a callback function (if there is one) to determine 'junk' characters.
- $options : array<string|int, mixed> = []
findLongestMatch()
Find the longest matching block in the two sequences, as defined by the lower and upper constraints for each sequence. (for the first sequence, $alo - $ahi and for the second sequence, $blo - $bhi)
public
findLongestMatch(int $alo, int $ahi, int $blo, int $bhi) : array<string|int, mixed>
Essentially, of all of the maximal matching blocks, return the one that startest earliest in $a, and all of those maximal matching blocks that start earliest in $a, return the one that starts earliest in $b.
If the junk callback is defined, do the above but with the restriction that the junk element appears in the block. Extend it as far as possible by matching only junk elements in both $a and $b.
Parameters
- $alo : int
-
The lower constraint for the first sequence.
- $ahi : int
-
The upper constraint for the first sequence.
- $blo : int
-
The lower constraint for the second sequence.
- $bhi : int
-
The upper constraint for the second sequence.
Return values
array<string|int, mixed> —Array containing the longest match that includes the starting position in $a, start in $b and the length/size.
getGroupedOpcodes()
Return a series of nested arrays containing different groups of generated opcodes for the differences between the strings with up to $context lines of surrounding content.
public
getGroupedOpcodes([int $context = 3 ]) : array<string|int, mixed>
Essentially what happens here is any big equal blocks of strings are stripped out, the smaller subsets of changes are then arranged in to their groups. This means that the sequence matcher and diffs do not need to include the full content of the different files but can still provide context as to where the changes are.
Parameters
- $context : int = 3
-
The number of lines of context to provide around the groups.
Return values
array<string|int, mixed> —Nested array of all of the grouped opcodes.
getMatchingBlocks()
Return a nested set of arrays for all of the matching sub-sequences in the strings $a and $b.
public
getMatchingBlocks() : array<string|int, mixed>
Each block contains the lower constraint of the block in $a, the lower constraint of the block in $b and finally the number of lines that the block continues for.
Return values
array<string|int, mixed> —Nested array of the matching blocks, as described by the function.
getOpCodes()
Return a list of all of the opcodes for the differences between the two strings.
public
getOpCodes() : array<string|int, mixed>
The nested array returned contains an array describing the opcode which includes: 0 - The type of tag (as described below) for the opcode. 1 - The beginning line in the first sequence. 2 - The end line in the first sequence. 3 - The beginning line in the second sequence. 4 - The end line in the second sequence.
The different types of tags include: replace - The string from $i1 to $i2 in $a should be replaced by the string in $b from $j1 to $j2. delete - The string in $a from $i1 to $j2 should be deleted. insert - The string in $b from $j1 to $j2 should be inserted at $i1 in $a. equal - The two strings with the specified ranges are equal.
Return values
array<string|int, mixed> —Array of the opcodes describing the differences between the strings.
linesAreDifferent()
Check if the two lines at the given indexes are different or not.
public
linesAreDifferent(int $aIndex, int $bIndex) : bool
Parameters
- $aIndex : int
-
Line number to check against in a.
- $bIndex : int
-
Line number to check against in b.
Return values
bool —True if the lines are different and false if not.
Ratio()
Return a measure of the similarity between the two sequences.
public
Ratio() : float
This will be a float value between 0 and 1.
Out of all of the ratio calculation functions, this is the most expensive to call if getMatchingBlocks or getOpCodes is yet to be called. The other calculation methods (quickRatio and realquickRatio) can be used to perform quicker calculations but may be less accurate.
The ratio is calculated as (2 * number of matches) / total number of elements in both sequences.
Return values
float —The calculated ratio.
setOptions()
Set new options
public
setOptions(array<string|int, mixed> $options) : mixed
Parameters
- $options : array<string|int, mixed>
setSeq1()
Set the first sequence ($a) and reset any internal caches to indicate that when calling the calculation methods, we need to recalculate them.
public
setSeq1(string|array<string|int, mixed> $a) : mixed
Parameters
- $a : string|array<string|int, mixed>
-
The sequence to set as the first sequence.
setSeq2()
Set the second sequence ($b) and reset any internal caches to indicate that when calling the calculation methods, we need to recalculate them.
public
setSeq2(string|array<string|int, mixed> $b) : mixed
Parameters
- $b : string|array<string|int, mixed>
-
The sequence to set as the second sequence.
setSequences()
Set the first and second sequences to use with the sequence matcher.
public
setSequences(string|array<string|int, mixed> $a, string|array<string|int, mixed> $b) : mixed
Parameters
- $a : string|array<string|int, mixed>
-
A string or array containing the lines to compare against.
- $b : string|array<string|int, mixed>
-
A string or array containing the lines to compare.
arrayGetDefault()
Helper function that provides the ability to return the value for a key in an array of it exists, or if it doesn't then return a default value.
private
arrayGetDefault(array<string|int, mixed> $array, string $key, mixed $default) : mixed
Essentially cleaner than doing a series of if(isset()) } else } calls.
Parameters
- $array : array<string|int, mixed>
-
The array to search.
- $key : string
-
The key to check that exists.
- $default : mixed
-
The value to return as the default value if the key doesn't exist.
Return values
mixed —The value from the array if the key exists or otherwise the default.
calculateRatio()
Helper function for calculating the ratio to measure similarity for the strings.
private
calculateRatio(int $matches[, int $length = 0 ]) : float
The ratio is defined as being 2 * (number of matches / total length)
Parameters
- $matches : int
-
The number of matches in the two strings.
- $length : int = 0
-
The length of the two strings.
Return values
float —The calculated ratio.
chainB()
Generate the internal arrays containing the list of junk and non-junk characters for the second ($b) sequence.
private
chainB() : mixed
isBJunk()
Checks if a particular character is in the junk dictionary for the list of junk characters.
private
isBJunk(mixed $b) : bool
Parameters
- $b : mixed
Return values
bool —True if the character is considered junk. False if not.
quickRatio()
Quickly return an upper bound ratio for the similarity of the strings.
private
quickRatio() : float
This is quicker to compute than Ratio().
Return values
float —The calculated ratio.
ratioReduce()
Helper function to calculate the number of matches for Ratio().
private
ratioReduce(int $sum, array<string|int, mixed> $triple) : int
Parameters
- $sum : int
-
The running total for the number of matches.
- $triple : array<string|int, mixed>
-
Array containing the matching block triple to add to the running total.
Return values
int —The new running total for the number of matches.
realquickRatio()
Return an upper bound ratio really quickly for the similarity of the strings.
private
realquickRatio() : float
This is quicker to compute than Ratio() and quickRatio().
Return values
float —The calculated ratio.
tupleSort()
Sort an array by the nested arrays it contains. Helper function for getMatchingBlocks
private
tupleSort(array<string|int, mixed> $a, array<string|int, mixed> $b) : int
Parameters
- $a : array<string|int, mixed>
-
First array to compare.
- $b : array<string|int, mixed>
-
Second array to compare.
Return values
int —-1, 0 or 1, as expected by the usort function.