lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alex vB <m...@avomberg.de>
Subject New codecs keep Freq skip/omit Pos
Date Fri, 22 Apr 2011 01:52:05 GMT
Hello everybody,

I am currently testing several new Lucene 4.0 codec implementations to
compare with an own solution.
The difference is that I am only indexing frequencies and not positions. I
would like to have this for the other codecs. I know there was already a
post for this topic
http://lucene.472066.n3.nabble.com/Omit-positions-but-not-TF-td599710.html. 

I just wanted to ask if there has something changed especially for the new
codecs.
I had a look at the FixedPostingWriterImpl and PostingsConsumer. Are those
they right places for adapting Pos/Freq handling? What would happen if I
just skip writing postions/payloads? Would it mess up the index? 

The written files have different endings like pyl, skp, pos, doc etc. Gives
me "not counting" the pos file a correct index size estimation for W Freqs
W/O Pos? Or where exactly are term positions written?

Regards
Alex

PS: Some results with the current codecs if someone is interested. I indexed
10% of Wikipedia(english).
Each version is indexed as document.

Docs	240179
Versions	8467927
Distinct Terms	3501214
total Terms	1520008204
Avg. Versions	35.25
Avg. Terms per Version	179.50
Avg. Terms per Doc	6328.65

PforDelta W Freq W Pos	       20.6 GB
PforDelta W/O Freq W/O Pos	         1.6 GB
Standard 4.0 W Freq W Pos	       28.1 GB
Standard 4.0 W/O Freq W/O Pos	 6.2 GB
Pfor W Freq W Pos	                  22 GB
Pfor W/O Freq W/O Pos	         3.1 GB

Performance follows ;)


--
View this message in context: http://lucene.472066.n3.nabble.com/New-codecs-keep-Freq-skip-omit-Pos-tp2849776p2849776.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message