HumHub Documentation (unofficial)

Diff_SequenceMatcher
in package

Sequence matcher for Diff

PHP version 5

Copyright (c) 2009 Chris Boulton chris.boulton@interspire.com

All rights reserved.

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

  • Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
  • Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
  • Neither the name of the Chris Boulton nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

Tags
author

Chris Boulton chris.boulton@interspire.com

copyright

(c) 2009 Chris Boulton

license

New BSD License http://www.opensource.org/licenses/bsd-license.php

version
1.1
link
http://github.com/chrisboulton/php-diff

Table of Contents

Properties

$a  : array<string|int, mixed>
$b  : array<string|int, mixed>
$b2j  : array<string|int, mixed>
$defaultOptions  : mixed
$fullBCount  : mixed
$junkCallback  : string|array<string|int, mixed>
$junkDict  : array<string|int, mixed>
$matchingBlocks  : mixed
$opCodes  : mixed
$options  : mixed

Methods

__construct()  : mixed
The constructor. With the sequences being passed, they'll be set for the sequence matcher and it will perform a basic cleanup & calculate junk elements.
findLongestMatch()  : array<string|int, mixed>
Find the longest matching block in the two sequences, as defined by the lower and upper constraints for each sequence. (for the first sequence, $alo - $ahi and for the second sequence, $blo - $bhi)
getGroupedOpcodes()  : array<string|int, mixed>
Return a series of nested arrays containing different groups of generated opcodes for the differences between the strings with up to $context lines of surrounding content.
getMatchingBlocks()  : array<string|int, mixed>
Return a nested set of arrays for all of the matching sub-sequences in the strings $a and $b.
getOpCodes()  : array<string|int, mixed>
Return a list of all of the opcodes for the differences between the two strings.
linesAreDifferent()  : bool
Check if the two lines at the given indexes are different or not.
Ratio()  : float
Return a measure of the similarity between the two sequences.
setOptions()  : mixed
Set new options
setSeq1()  : mixed
Set the first sequence ($a) and reset any internal caches to indicate that when calling the calculation methods, we need to recalculate them.
setSeq2()  : mixed
Set the second sequence ($b) and reset any internal caches to indicate that when calling the calculation methods, we need to recalculate them.
setSequences()  : mixed
Set the first and second sequences to use with the sequence matcher.
arrayGetDefault()  : mixed
Helper function that provides the ability to return the value for a key in an array of it exists, or if it doesn't then return a default value.
calculateRatio()  : float
Helper function for calculating the ratio to measure similarity for the strings.
chainB()  : mixed
Generate the internal arrays containing the list of junk and non-junk characters for the second ($b) sequence.
isBJunk()  : bool
Checks if a particular character is in the junk dictionary for the list of junk characters.
quickRatio()  : float
Quickly return an upper bound ratio for the similarity of the strings.
ratioReduce()  : int
Helper function to calculate the number of matches for Ratio().
realquickRatio()  : float
Return an upper bound ratio really quickly for the similarity of the strings.
tupleSort()  : int
Sort an array by the nested arrays it contains. Helper function for getMatchingBlocks

Properties

$a

private array<string|int, mixed> $a = \null

The first sequence to compare against.

$b2j

private array<string|int, mixed> $b2j = array()

Array of indices that do not contain junk elements.

$defaultOptions

private mixed $defaultOptions = array('ignoreNewLines' => \false, 'ignoreWhitespace' => \false, 'ignoreCase' => \false)

$junkCallback

private string|array<string|int, mixed> $junkCallback = \null

Either a string or an array containing a callback function to determine if a line is "junk" or not.

$junkDict

private array<string|int, mixed> $junkDict = array()

Array of characters that are considered junk from the second sequence. Characters are the array key.

Methods

__construct()

The constructor. With the sequences being passed, they'll be set for the sequence matcher and it will perform a basic cleanup & calculate junk elements.

public __construct(string|array<string|int, mixed> $a, string|array<string|int, mixed> $b[, string|array<string|int, mixed> $junkCallback = null ][, array<string|int, mixed> $options = [] ]) : mixed
Parameters
$a : string|array<string|int, mixed>

A string or array containing the lines to compare against.

$b : string|array<string|int, mixed>

A string or array containing the lines to compare.

$junkCallback : string|array<string|int, mixed> = null

Either an array or string that references a callback function (if there is one) to determine 'junk' characters.

$options : array<string|int, mixed> = []

findLongestMatch()

Find the longest matching block in the two sequences, as defined by the lower and upper constraints for each sequence. (for the first sequence, $alo - $ahi and for the second sequence, $blo - $bhi)

public findLongestMatch(int $alo, int $ahi, int $blo, int $bhi) : array<string|int, mixed>

Essentially, of all of the maximal matching blocks, return the one that startest earliest in $a, and all of those maximal matching blocks that start earliest in $a, return the one that starts earliest in $b.

If the junk callback is defined, do the above but with the restriction that the junk element appears in the block. Extend it as far as possible by matching only junk elements in both $a and $b.

Parameters
$alo : int

The lower constraint for the first sequence.

$ahi : int

The upper constraint for the first sequence.

$blo : int

The lower constraint for the second sequence.

$bhi : int

The upper constraint for the second sequence.

Return values
array<string|int, mixed>

Array containing the longest match that includes the starting position in $a, start in $b and the length/size.

getGroupedOpcodes()

Return a series of nested arrays containing different groups of generated opcodes for the differences between the strings with up to $context lines of surrounding content.

public getGroupedOpcodes([int $context = 3 ]) : array<string|int, mixed>

Essentially what happens here is any big equal blocks of strings are stripped out, the smaller subsets of changes are then arranged in to their groups. This means that the sequence matcher and diffs do not need to include the full content of the different files but can still provide context as to where the changes are.

Parameters
$context : int = 3

The number of lines of context to provide around the groups.

Return values
array<string|int, mixed>

Nested array of all of the grouped opcodes.

getMatchingBlocks()

Return a nested set of arrays for all of the matching sub-sequences in the strings $a and $b.

public getMatchingBlocks() : array<string|int, mixed>

Each block contains the lower constraint of the block in $a, the lower constraint of the block in $b and finally the number of lines that the block continues for.

Return values
array<string|int, mixed>

Nested array of the matching blocks, as described by the function.

getOpCodes()

Return a list of all of the opcodes for the differences between the two strings.

public getOpCodes() : array<string|int, mixed>

The nested array returned contains an array describing the opcode which includes: 0 - The type of tag (as described below) for the opcode. 1 - The beginning line in the first sequence. 2 - The end line in the first sequence. 3 - The beginning line in the second sequence. 4 - The end line in the second sequence.

The different types of tags include: replace - The string from $i1 to $i2 in $a should be replaced by the string in $b from $j1 to $j2. delete - The string in $a from $i1 to $j2 should be deleted. insert - The string in $b from $j1 to $j2 should be inserted at $i1 in $a. equal - The two strings with the specified ranges are equal.

Return values
array<string|int, mixed>

Array of the opcodes describing the differences between the strings.

linesAreDifferent()

Check if the two lines at the given indexes are different or not.

public linesAreDifferent(int $aIndex, int $bIndex) : bool
Parameters
$aIndex : int

Line number to check against in a.

$bIndex : int

Line number to check against in b.

Return values
bool

True if the lines are different and false if not.

Ratio()

Return a measure of the similarity between the two sequences.

public Ratio() : float

This will be a float value between 0 and 1.

Out of all of the ratio calculation functions, this is the most expensive to call if getMatchingBlocks or getOpCodes is yet to be called. The other calculation methods (quickRatio and realquickRatio) can be used to perform quicker calculations but may be less accurate.

The ratio is calculated as (2 * number of matches) / total number of elements in both sequences.

Return values
float

The calculated ratio.

setOptions()

Set new options

public setOptions(array<string|int, mixed> $options) : mixed
Parameters
$options : array<string|int, mixed>

setSeq1()

Set the first sequence ($a) and reset any internal caches to indicate that when calling the calculation methods, we need to recalculate them.

public setSeq1(string|array<string|int, mixed> $a) : mixed
Parameters
$a : string|array<string|int, mixed>

The sequence to set as the first sequence.

setSeq2()

Set the second sequence ($b) and reset any internal caches to indicate that when calling the calculation methods, we need to recalculate them.

public setSeq2(string|array<string|int, mixed> $b) : mixed
Parameters
$b : string|array<string|int, mixed>

The sequence to set as the second sequence.

setSequences()

Set the first and second sequences to use with the sequence matcher.

public setSequences(string|array<string|int, mixed> $a, string|array<string|int, mixed> $b) : mixed
Parameters
$a : string|array<string|int, mixed>

A string or array containing the lines to compare against.

$b : string|array<string|int, mixed>

A string or array containing the lines to compare.

arrayGetDefault()

Helper function that provides the ability to return the value for a key in an array of it exists, or if it doesn't then return a default value.

private arrayGetDefault(array<string|int, mixed> $array, string $key, mixed $default) : mixed

Essentially cleaner than doing a series of if(isset()) } else } calls.

Parameters
$array : array<string|int, mixed>

The array to search.

$key : string

The key to check that exists.

$default : mixed

The value to return as the default value if the key doesn't exist.

Return values
mixed

The value from the array if the key exists or otherwise the default.

calculateRatio()

Helper function for calculating the ratio to measure similarity for the strings.

private calculateRatio(int $matches[, int $length = 0 ]) : float

The ratio is defined as being 2 * (number of matches / total length)

Parameters
$matches : int

The number of matches in the two strings.

$length : int = 0

The length of the two strings.

Return values
float

The calculated ratio.

chainB()

Generate the internal arrays containing the list of junk and non-junk characters for the second ($b) sequence.

private chainB() : mixed

isBJunk()

Checks if a particular character is in the junk dictionary for the list of junk characters.

private isBJunk(mixed $b) : bool
Parameters
$b : mixed
Return values
bool

True if the character is considered junk. False if not.

quickRatio()

Quickly return an upper bound ratio for the similarity of the strings.

private quickRatio() : float

This is quicker to compute than Ratio().

Return values
float

The calculated ratio.

ratioReduce()

Helper function to calculate the number of matches for Ratio().

private ratioReduce(int $sum, array<string|int, mixed> $triple) : int
Parameters
$sum : int

The running total for the number of matches.

$triple : array<string|int, mixed>

Array containing the matching block triple to add to the running total.

Return values
int

The new running total for the number of matches.

realquickRatio()

Return an upper bound ratio really quickly for the similarity of the strings.

private realquickRatio() : float

This is quicker to compute than Ratio() and quickRatio().

Return values
float

The calculated ratio.

tupleSort()

Sort an array by the nested arrays it contains. Helper function for getMatchingBlocks

private tupleSort(array<string|int, mixed> $a, array<string|int, mixed> $b) : int
Parameters
$a : array<string|int, mixed>

First array to compare.

$b : array<string|int, mixed>

Second array to compare.

Return values
int

-1, 0 or 1, as expected by the usort function.


        
On this page

Search results