lucene-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From jim...@apache.org
Subject [lucene-solr] branch branch_7_7 updated: LUCENE-8676: The Korean tokenizer does not update the last position if the backtrace is caused by a big buffer (1024 chars).
Date Fri, 01 Feb 2019 10:39:59 GMT
This is an automated email from the ASF dual-hosted git repository.

jimczi pushed a commit to branch branch_7_7
in repository https://gitbox.apache.org/repos/asf/lucene-solr.git


The following commit(s) were added to refs/heads/branch_7_7 by this push:
     new e05ed2f  LUCENE-8676: The Korean tokenizer does not update the last position if the
backtrace is caused by a big buffer (1024 chars).
e05ed2f is described below

commit e05ed2ffb5a2df20163af9a7d8ea425b4218cade
Author: jimczi <jimczi@apache.org>
AuthorDate: Fri Feb 1 11:37:16 2019 +0100

    LUCENE-8676: The Korean tokenizer does not update the last position if the backtrace is
caused by a big buffer (1024 chars).
---
 lucene/CHANGES.txt                                                     | 3 +++
 .../nori/src/java/org/apache/lucene/analysis/ko/KoreanTokenizer.java   | 2 +-
 2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/lucene/CHANGES.txt b/lucene/CHANGES.txt
index 9f326a8..67aba94 100644
--- a/lucene/CHANGES.txt
+++ b/lucene/CHANGES.txt
@@ -44,6 +44,9 @@ Bug fixes:
   was not propagating final position increments from its child streams correctly.
   (Dan Meehl, Alan Woodward)
 
+* LUCENE-8676: The Korean tokenizer does not update the last position if the backtrace is
caused
+  by a big buffer (1024 chars). (Jim Ferenczi)
+
 New Features
 
 * LUCENE-8026: ExitableDirectoryReader may now time out queries that run on
diff --git a/lucene/analysis/nori/src/java/org/apache/lucene/analysis/ko/KoreanTokenizer.java
b/lucene/analysis/nori/src/java/org/apache/lucene/analysis/ko/KoreanTokenizer.java
index 012352c..8875fd0 100644
--- a/lucene/analysis/nori/src/java/org/apache/lucene/analysis/ko/KoreanTokenizer.java
+++ b/lucene/analysis/nori/src/java/org/apache/lucene/analysis/ko/KoreanTokenizer.java
@@ -535,7 +535,6 @@ public final class KoreanTokenizer extends Tokenizer {
       }
 
       if (pos > lastBackTracePos && posData.count == 1 && isFrontier)
{
-        //  if (pos > lastBackTracePos && posData.count == 1 && isFrontier)
{
         // We are at a "frontier", and only one node is
         // alive, so whatever the eventual best path is must
         // come through this node.  So we can safely commit
@@ -618,6 +617,7 @@ public final class KoreanTokenizer extends Tokenizer {
         } else {
           // This means the backtrace only produced
           // punctuation tokens, so we must keep parsing.
+          continue;
         }
       }
 


Mime
View raw message