lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From bugzi...@apache.org
Subject DO NOT REPLY [Bug 32687] New: - org.apache.lucene.analysis.cn.ChineseTokenizer missing offset decrement
Date Tue, 14 Dec 2004 09:09:22 GMT
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG·
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://issues.apache.org/bugzilla/show_bug.cgi?id=32687>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND·
INSERTED IN THE BUG DATABASE.

http://issues.apache.org/bugzilla/show_bug.cgi?id=32687

           Summary: org.apache.lucene.analysis.cn.ChineseTokenizer missing
                    offset decrement
           Product: Lucene
           Version: unspecified
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: trivial
          Priority: P2
         Component: Analysis
        AssignedTo: lucene-dev@jakarta.apache.org
        ReportedBy: saturnism@gmail.com


Apparently, in ChineseTokenizer, offset should be decremented like bufferIndex
when Character is OTHER_LETTER.  This directly affects startOffset and endOffset
values.

This is critical to have Highlighter working correctly because Highlighter marks
matching text based on these offset values.

-- 
Configure bugmail: http://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Mime
View raw message