asterixdb-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Taewoo Kim (Code Review)" <do-not-re...@asterixdb.incubator.apache.org>
Subject Change in asterixdb[master]: ASTERIXDB-1778: Optimize the edit-distance-check function
Date Sat, 04 Feb 2017 22:33:04 GMT
Hello Jenkins,

I'd like you to reexamine a change.  Please visit

    https://asterix-gerrit.ics.uci.edu/1481

to look at the new patch set (#7).

Change subject: ASTERIXDB-1778: Optimize the edit-distance-check function
......................................................................

ASTERIXDB-1778: Optimize the edit-distance-check function

 - Only calculate 2 * (threshold + 1) cells, rather than all cells per row.
 - Terminate the calculation steps early when it become obvious that
   the possible edit-distance value is greater than the given threshold.
   There is no reason to compute all cells in the 2 dimensional array.
 - Move the location of IListIterator to Hyracks since we now have
   a CharacterIterator in a String. Change the name to ISequenceIterator.
 - Add the section for the function in the manual.
 - Remove letter counting filtering method since it is only applicable for
   the string in ASCII range (0 ~ 127).

Change-Id: Ibc8729c4514bb87c347dd7d50358fd897b769977
---
M asterixdb/asterix-doc/src/main/markdown/builtins/5_similarity.md
M asterixdb/asterix-fuzzyjoin/pom.xml
M asterixdb/asterix-fuzzyjoin/src/main/java/org/apache/asterix/fuzzyjoin/similarity/IGenericSimilarityMetric.java
M asterixdb/asterix-fuzzyjoin/src/main/java/org/apache/asterix/fuzzyjoin/similarity/SimilarityMetric.java
M asterixdb/asterix-fuzzyjoin/src/main/java/org/apache/asterix/fuzzyjoin/similarity/SimilarityMetricEditDistance.java
M asterixdb/asterix-fuzzyjoin/src/main/java/org/apache/asterix/fuzzyjoin/similarity/SimilarityMetricJaccard.java
A asterixdb/asterix-fuzzyjoin/src/test/java/org/apache/asterix/fuzzyjoin/similarity/SimilarityMetricEditDistanceTest.java
M asterixdb/asterix-runtime/src/main/java/org/apache/asterix/runtime/evaluators/common/AbstractAsterixListIterator.java
M asterixdb/asterix-runtime/src/main/java/org/apache/asterix/runtime/evaluators/common/EditDistanceCheckEvaluator.java
M asterixdb/asterix-runtime/src/main/java/org/apache/asterix/runtime/evaluators/common/EditDistanceEvaluator.java
M asterixdb/asterix-runtime/src/main/java/org/apache/asterix/runtime/evaluators/common/SimilarityJaccardSortedCheckEvaluator.java
M asterixdb/asterix-runtime/src/main/java/org/apache/asterix/runtime/evaluators/common/SimilarityJaccardSortedEvaluator.java
R hyracks-fullstack/hyracks/hyracks-data/hyracks-data-std/src/main/java/org/apache/hyracks/data/std/util/ISequenceIterator.java
A hyracks-fullstack/hyracks/hyracks-data/hyracks-data-std/src/main/java/org/apache/hyracks/data/std/util/UTF8StringCharByCharIterator.java
14 files changed, 360 insertions(+), 281 deletions(-)


  git pull ssh://asterix-gerrit.ics.uci.edu:29418/asterixdb refs/changes/81/1481/7
-- 
To view, visit https://asterix-gerrit.ics.uci.edu/1481
To unsubscribe, visit https://asterix-gerrit.ics.uci.edu/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ibc8729c4514bb87c347dd7d50358fd897b769977
Gerrit-PatchSet: 7
Gerrit-Project: asterixdb
Gerrit-Branch: master
Gerrit-Owner: Taewoo Kim <wangsaeu@gmail.com>
Gerrit-Reviewer: Chen Li <chenli@gmail.com>
Gerrit-Reviewer: Jenkins <jenkins@fulliautomatix.ics.uci.edu>
Gerrit-Reviewer: Jianfeng Jia <jianfeng.jia@gmail.com>
Gerrit-Reviewer: Steven Jacobs <sjaco002@ucr.edu>
Gerrit-Reviewer: Taewoo Kim <wangsaeu@gmail.com>

Mime
View raw message