lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From eks dev <>
Subject RLE Compressing bit vectors, just toughts
Date Sat, 04 Aug 2007 07:36:33 GMT
Would it be possible somehow to make skip list (postings) RLE compressed without affecting
performance in cases where RLE cannot identify longer runs?

we have an unusual (?) case where we have an opportunity to sort documents on category field
before indexing. this order gets slightly disturbed during updates on index, but they normally
they stay mostly sorted. Also, we noticed that there are longer runs of doc IDs even in standard
case for hi frequency tokens due to some sort of locality (eg. web pages from one web site
tend to have a lot of tokens in common)...

having RLE compressed skip lists for our category fields would bring huge savings (less readVInts),
but on the other side it requires at least one if() in tight loops in next() and skipTo()
that can slow down sparse case.

so the questions would be, can one see benefits in standard case? is it doable at all without
turning everything upside down? maybe already there, me being plain stupid here.... 

Yahoo! Answers - Got a question? Someone out there knows the answer. Try it

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message