lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Carsten Schnober <schno...@ids-mannheim.de>
Subject Re: Boolean and SpanQuery: different results
Date Mon, 17 Dec 2012 10:54:32 GMT
Am 13.12.2012 18:00, schrieb Jack Krupansky:
> Can you provide some examples of terms that don't work and the index
> token stream they fail on?
> 
> Make sure that the Analyzer you are using doesn't do any magic on the
> indexed terms - your query term is unanalyzed. Maybe multiple, but
> distinct, index terms are analyzing to the same, but unexpected term.

I've done some further analysis and it turns out that for some reason,
the SpanQuery described previously returns matches for the first entry
(in 18 existing ones) in the list returned by reader.leaves().

As stated in my first post in this thread, my code builds a SpanQuery
for each AtomicReaderContext in a list retrieved through
MultiReader.leaves(). That SpanQuery is identical to a BooleanQuery with
TermQueries for the exactly same terms performed with
IndexSearcher.search() on that same MultiReader.

The document ids of the hits found through the SpanQuery correspond to
the ones returned by the BooleanQuery for the same term. However, the
documents returned by the BooleanQuery that do not lye within the first
AtomicReaderContext are not found by the SpanQuery.

Might this have to do with the docbase? I collect the document IDs from
the BooleanQuery through a Collector, adding the actual ID to the
current AtomicReaderContext.docbase. In the corresponding SpanQuery, I
pass these document IDs as a DocIdBitSet as an argument to
SpanQuery.getSpans().

Thanks!
Carsten


-- 
Institut für Deutsche Sprache | http://www.ids-mannheim.de
Projekt KorAP                 | http://korap.ids-mannheim.de
Tel. +49-(0)621-43740789      | schnober@ids-mannheim.de
Korpusanalyseplattform der nächsten Generation
Next Generation Corpus Analysis Platform

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message