lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Carsten Schnober <schno...@ids-mannheim.de>
Subject Re: Boolean and SpanQuery: different results
Date Thu, 13 Dec 2012 17:13:31 GMT
Am 13.12.2012 18:00, schrieb Jack Krupansky:
> Can you provide some examples of terms that don't work and the index
> token stream they fail on?

The index I'm testing with is German Wikipedia and I've been testing
with different (arbitrarily chosen) terms. I'm listing some results, the
first number is the number of documents matched with a BooleanQuery, the
second number is the number of documents matches with a SpanQuery:

- Knacklaut	24/19
- schönes	70/70
- zufällige	71/70
- wunderbar	24/24
- Himmel	773/753
- Sonne	1190/1152


> Make sure that the Analyzer you are using doesn't do any magic on the
> indexed terms - your query term is unanalyzed. Maybe multiple, but
> distinct, index terms are analyzing to the same, but unexpected term.

I'm using a custom Analyzer during indexing. Regarding the analyzer
applied during search, I'm not sure: as I haven't defined any specific
one, what does Lucene choose? I wasn't thinking about that because I
assumed that this should make no difference regarding the BooleanQuery
vs. SpanQuery issue.
Thanks for the hint anyway, I'll have a closer look there.
Best,
Carsten

-- 
Institut für Deutsche Sprache | http://www.ids-mannheim.de
Projekt KorAP                 | http://korap.ids-mannheim.de
Tel. +49-(0)621-43740789      | schnober@ids-mannheim.de
Korpusanalyseplattform der nächsten Generation
Next Generation Corpus Analysis Platform

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message