lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LUCENE-3326) MoreLikeThis reuses a reader after it has already closed it
Date Mon, 18 Jul 2011 08:17:00 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-3326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13066845#comment-13066845
] 

Uwe Schindler commented on LUCENE-3326:
---------------------------------------

+1 nuke default charset shit!

> MoreLikeThis reuses a reader after it has already closed it
> -----------------------------------------------------------
>
>                 Key: LUCENE-3326
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3326
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: modules/other
>    Affects Versions: 3.3
>            Reporter: Trejkaz
>         Attachments: LUCENE-3326.patch
>
>
> MoreLikeThis has a fatal bug whereby it tries to reuse a reader for multiple fields:
> {code}
>     Map<String,Int> words = new HashMap<String,Int>();
>     for (int i = 0; i < fieldNames.length; i++) {
>         String fieldName = fieldNames[i];
>         addTermFrequencies(r, words, fieldName);
>     }
> {code}
> However, addTermFrequencies() is creating a TokenStream for this reader:
> {code}
>     TokenStream ts = analyzer.reusableTokenStream(fieldName, r);
>     int tokenCount=0;
>     // for every token
>     CharTermAttribute termAtt = ts.addAttribute(CharTermAttribute.class);
>     ts.reset();
>     while (ts.incrementToken()) {
>         /* body omitted */
>     }
>     ts.end();
>     ts.close();
> {code}
> When it closes this analyser, it closes the underlying reader.  Then the second time
around the loop, you get:
> {noformat}
> Caused by: java.io.IOException: Stream closed
> 	at sun.nio.cs.StreamDecoder.ensureOpen(StreamDecoder.java:27)
> 	at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:128)
> 	at java.io.InputStreamReader.read(InputStreamReader.java:167)
> 	at com.acme.util.CompositeReader.read(CompositeReader.java:101)
> 	at org.apache.lucene.analysis.standard.StandardTokenizerImpl.zzRefill(StandardTokenizerImpl.java:803)
> 	at org.apache.lucene.analysis.standard.StandardTokenizerImpl.getNextToken(StandardTokenizerImpl.java:1010)
> 	at org.apache.lucene.analysis.standard.StandardTokenizer.incrementToken(StandardTokenizer.java:178)
> 	at org.apache.lucene.analysis.standard.StandardFilter.incrementTokenClassic(StandardFilter.java:61)
> 	at org.apache.lucene.analysis.standard.StandardFilter.incrementToken(StandardFilter.java:57)
> 	at com.acme.storage.index.analyser.NormaliseFilter.incrementToken(NormaliseFilter.java:51)
> 	at org.apache.lucene.analysis.LowerCaseFilter.incrementToken(LowerCaseFilter.java:60)
> 	at org.apache.lucene.search.similar.MoreLikeThis.addTermFrequencies(MoreLikeThis.java:931)
> 	at org.apache.lucene.search.similar.MoreLikeThis.retrieveTerms(MoreLikeThis.java:1003)
> 	at org.apache.lucene.search.similar.MoreLikeThis.retrieveInterestingTerms(MoreLikeThis.java:1036)
> {noformat}
> My first thought was that it seems like a "ReaderFactory" of sorts should be passed in
so that a new Reader can be created for the second field (maybe the factory could be passed
the field name, so that if someone wanted to pass a different reader to each, they could.)
> Interestingly, the methods taking File and URL exhibit the same issue.  I'm not sure
what to do about those (and we're not using them.)  The method taking File could open the
file twice, but the method taking a URL probably shouldn't fetch the same URL twice.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message