lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From bugzi...@apache.org
Subject DO NOT REPLY [Bug 7412] New: - GermanStemFilter setting wrong values for startoffset/endoffset of stemmed tokens
Date Sun, 24 Mar 2002 16:07:28 GMT
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG 
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://nagoya.apache.org/bugzilla/show_bug.cgi?id=7412>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND 
INSERTED IN THE BUG DATABASE.

http://nagoya.apache.org/bugzilla/show_bug.cgi?id=7412

GermanStemFilter setting wrong values for startoffset/endoffset of stemmed tokens

           Summary: GermanStemFilter setting wrong values for
                    startoffset/endoffset of stemmed tokens
           Product: Lucene
           Version: CVS Nightly - Specify date in submission
          Platform: PC
        OS/Version: Linux
            Status: NEW
          Severity: Normal
          Priority: Other
         Component: Analysis
        AssignedTo: lucene-dev@jakarta.apache.org
        ReportedBy: reyes@charabia.net


The GermanStemFilter sets wrong values to the new Token object created when the 
stemmer succeeds in stemming the termText() string. Bug found in 1.2-RC5-dev

-----------------
Example, for the processing of the string "this is a simple test":
token : thi (0,3)
token : is (5,7)
token : a (8,9)
token : simpl (0,5)
token : test (17,21)

(all the stemmed tokens have wrong start/end offsets).

--
To unsubscribe, e-mail:   <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>


Mime
View raw message