lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alex vB <>
Subject Lucene query processing
Date Tue, 26 Apr 2011 23:35:02 GMT
Hello everybody,

As far as I know Lucene processes documents DAAT. Depending on the query
either the intersection or union is calculated. For the intersection only
documents occurring in all posting lists are scored. In the union case every
document is scored which makes it a more expensive operation. 

Lucene stores its index into several files. Depending on the query different
files might be accessed for scoring. For example a payload query needs to
read paylods from .pos.

What is not clear for me how term frequencies or payloads are processed.
Assuming I store term frequencies I need to set
1) Which queries include term frequencies? I assume all queries if term
frequencies are stored?
2) Why is fetching payloads so much more expensive than getting term
frequencies. Both are stored in seperated files and therefore demand a disk
3) What for a value contains tf if I set setOmitTermFreqAndPositions(true)?
Allways 1?
4) How are term freqs, payloads read from disk? In bulk for all remaining
docs at once or every time a document gets scored?


View this message in context:
Sent from the Lucene - Java Users mailing list archive at

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message