lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrzej Bialecki ...@getopt.org>
Subject VSEncoding - int codec faster than p4delta
Date Wed, 27 Oct 2010 09:27:51 GMT
Hi,

I found this paper recently published in proceedings of CIKM'10:

VSEncoding: efficient coding and fast decoding of integer lists via
dynamic programming, F. Silvestri, R. Venturini

Available here http://portal.acm.org/citation.cfm?id=1871437.1871592

My favorite quotes:

"We point out that only our methods, together with In-
terpolative, are able to beat the entropy of the lists on the
datasets. This quasi-paradoxical eect is, indeed, present
because entropy does not consider context information. En-
tropy, or to use a notation commonly used in text compres-
sion, zeroth-order entropy, does not take into account pat-
terns (i.e. the context) that can be present in lists of blocks
of integers. By grouping together blocks of integers, in fact,
we are able to assign codewords to more than a single value
at a time. Therefore, it appears obvious that we can beat
the entropy in the case of VSE, VSE-R and Interpolative. Es-
sentially, this is possible since we exploit regularities on the
lists on these very skewed d-gaps lists (e.g., small values close
to each other or quite long runs of 1s). We remark that beat
the entropy is not possible with any prex code (e.g., sta-
tistical compressors like Arithmetic and Human or integer
encoders like , , 's, Golomb, and so on). Therefore, our
methods is certainly better in compression than any of these
kind of methods without the need of any comparison. Any-
way, for the sake of completeness, we report those results as
well in Table 3."

"As expected, Interpolative is the slowest as opposed to VSE
which tops others with more than 800 millions of integers
per second. Our methods, VSE and VSE-R, are among the
fastest in decoding with a number of mis (millions of integers per
second) decoded ranging from 450 of VSE-R to 835 of
VSE both of them measured using the gov2 collection. It
is interesting to observe the better performance in terms
of decoding speed of VSE with respect to others, and in
particular with respect to OPT-P4D, Simple9 and Simple16
which are considered state-of-the-art as far as decompression
speed is concerned."

P4D compression achieved a speed of 460 mis in this test, which means
the VSE method is twice as fast.

-- 
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message