public class HistogramDiff extends LowLevelDiffAlgorithm
setMaxChainLength(int)
. If sequence A has more than this many
elements that hash into the same hash bucket, the algorithm passes the region
to setFallbackAlgorithm(DiffAlgorithm)
. If no fallback algorithm is
configured, the region is emitted as a replace edit.
During scanning of sequence B, any element of A that occurs more than
setMaxChainLength(int)
times is never considered for an LCS match
position, even if it is common between the two sequences. This limits the
number of locations in sequence A that must be considered to find the LCS,
and helps maintain a lower running time bound.
So long as setMaxChainLength(int)
is a small constant (such as 64),
the algorithm runs in O(N * D) time, where N is the sum of the input lengths
and D is the number of edits in the resulting EditList. If the supplied
SequenceComparator
has a good hash function, this implementation
typically out-performs MyersDiff
, even though its theoretical running
time is the same.
This implementation has an internal limitation that prevents it from handling
sequences with more than 268,435,456 (2^28) elements.DiffAlgorithm.SupportedAlgorithm
Constructor and Description |
---|
HistogramDiff() |
Modifier and Type | Method and Description |
---|---|
<S extends Sequence> |
diffNonCommon(EditList edits,
HashedSequenceComparator<S> cmp,
HashedSequence<S> a,
HashedSequence<S> b,
Edit region)
Compare two sequences and identify a list of edits between them.
|
void |
setFallbackAlgorithm(DiffAlgorithm alg)
Set the algorithm used when there are too many element occurrences.
|
void |
setMaxChainLength(int maxLen)
Maximum number of positions to consider for a given element hash.
|
diffNonCommon
diff, getAlgorithm
public void setFallbackAlgorithm(DiffAlgorithm alg)
alg
- the secondary algorithm. If null the region will be denoted as
a single REPLACE block.public void setMaxChainLength(int maxLen)
maxLen
- new maximum length.public <S extends Sequence> void diffNonCommon(EditList edits, HashedSequenceComparator<S> cmp, HashedSequence<S> a, HashedSequence<S> b, Edit region)
LowLevelDiffAlgorithm
DiffAlgorithm.diff(SequenceComparator, Sequence, Sequence)
method, which invokes this method using Subsequence
s.diffNonCommon
in class LowLevelDiffAlgorithm
S
- type of sequence being compared.edits
- result list to append the region's edits onto.cmp
- the comparator supplying the element equivalence function.a
- the first (also known as old or pre-image) sequence. Edits
returned by this algorithm will reference indexes using the
'A' side: Edit.getBeginA()
, Edit.getEndA()
.b
- the second (also known as new or post-image) sequence. Edits
returned by this algorithm will reference indexes using the
'B' side: Edit.getBeginB()
, Edit.getEndB()
.region
- the region being compared within the two sequences.Copyright © 2012. All Rights Reserved.