lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark Miller <markrmil...@gmail.com>
Subject Re: Bug in Lucene 2.2.0 code? Simple code included (StringIndexOutOfBoundsException).
Date Sun, 29 Jul 2007 01:05:10 GMT
I'm am going to try and write up some more info for you tomorrow, but 
just to point out: I do think there is a bug in the way offsets are 
being handled. I don't think this is causing your current problem (what 
I mentioned is) but it will prob cause you problems down the road. I 
will look into this further.

- Mark

Lukas Vlcek wrote:
> Hi Lucene experts,
>
> The following is a simple Lucene code which generates
> StringIndexOutOfBoundsException exception. I am using Lucene 2.2.0 official
> releasse. Can anyone tell me what is wrong with this code? Is this a bug or
> a feature of Lucene? Any comments/hits highly welcommed!
>
> In a nutshell I have a document with two (or four) fileds:
> 1) all
> 2-4) small
>
> I use [all] for searching and [small] for highlighting.
>
> [packkage and imports truncated...]
>
> public class MemoryIndexCase {
>     static public void main(String[] arg) {
>
>         Document doc = new Document();
>
>         doc.add(new Field("all","example long text",
>                 Field.Store.NO, Field.Index.TOKENIZED));
>         doc.add(new Field("small","example",
>                 Field.Store.YES, Field.Index.UN_TOKENIZED,
> Field.TermVector.WITH_POSITIONS_OFFSETS));
>         doc.add(new Field("small","long",
>                 Field.Store.YES, Field.Index.UN_TOKENIZED,
> Field.TermVector.WITH_POSITIONS_OFFSETS));
>         doc.add(new Field("small","text",
>                 Field.Store.YES, Field.Index.UN_TOKENIZED,
> Field.TermVector.WITH_POSITIONS_OFFSETS));
>
>         try {
>             Directory idx = new RAMDirectory();
>             IndexWriter writer = new IndexWriter(idx, new
> StandardAnalyzer(), true);
>
>             writer.addDocument(doc);
>             writer.optimize();
>             writer.close();
>
>             Searcher searcher = new IndexSearcher(idx);
>
>             QueryParser qp = new QueryParser("all", new StandardAnalyzer());
>             Query query = qp.parse("example text");
>             Hits hits = searcher.search(query);
>
>             Highlighter highlighter =    new Highlighter(new
> QueryScorer(query));
>
>             IndexReader ir = IndexReader.open(idx);
>             for (int i = 0; i < hits.length(); i++) {
>
>                 String text = hits.doc(i).get("small");
>
>                 TermFreqVector tfv = ir.getTermFreqVector(hits.id(i),
> "small");
>                 TokenStream tokenStream=
> TokenSources.getTokenStream((TermPositionVector)
> tfv);
>
>                 String result =
>                     highlighter.getBestFragment(tokenStream,text);
>                 System.out.println(result);
>             }
>
>         } catch (Throwable e) {
>             e.printStackTrace();
>         }
>     }
> }
>
> The exception is:
> java.lang.StringIndexOutOfBoundsException: String index out of range: 11
>     at java.lang.String.substring(String.java:1935)
>     at org.apache.lucene.search.highlight.Highlighter.getBestTextFragments(
> Highlighter.java:235)
>     at org.apache.lucene.search.highlight.Highlighter.getBestFragments(
> Highlighter.java:175)
>     at org.apache.lucene.search.highlight.Highlighter.getBestFragment(
> Highlighter.java:101)
>     at org.lucenetest.MemoryIndexCase.main(MemoryIndexCase.java:70)
>
> Best regards,
> Lukas
>
>   

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message