lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alex vB <>
Subject New codecs keep Freq skip/omit Pos
Date Fri, 22 Apr 2011 01:52:05 GMT
Hello everybody,

I am currently testing several new Lucene 4.0 codec implementations to
compare with an own solution.
The difference is that I am only indexing frequencies and not positions. I
would like to have this for the other codecs. I know there was already a
post for this topic 

I just wanted to ask if there has something changed especially for the new
I had a look at the FixedPostingWriterImpl and PostingsConsumer. Are those
they right places for adapting Pos/Freq handling? What would happen if I
just skip writing postions/payloads? Would it mess up the index? 

The written files have different endings like pyl, skp, pos, doc etc. Gives
me "not counting" the pos file a correct index size estimation for W Freqs
W/O Pos? Or where exactly are term positions written?


PS: Some results with the current codecs if someone is interested. I indexed
10% of Wikipedia(english).
Each version is indexed as document.

Docs	240179
Versions	8467927
Distinct Terms	3501214
total Terms	1520008204
Avg. Versions	35.25
Avg. Terms per Version	179.50
Avg. Terms per Doc	6328.65

PforDelta W Freq W Pos	       20.6 GB
PforDelta W/O Freq W/O Pos	         1.6 GB
Standard 4.0 W Freq W Pos	       28.1 GB
Standard 4.0 W/O Freq W/O Pos	 6.2 GB
Pfor W Freq W Pos	                  22 GB
Pfor W/O Freq W/O Pos	         3.1 GB

Performance follows ;)

View this message in context:
Sent from the Lucene - Java Users mailing list archive at

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message