lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Renaud Delbru (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-1410) PFOR implementation
Date Fri, 02 Apr 2010 16:16:27 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12852855#action_12852855
] 

Renaud Delbru commented on LUCENE-1410:
---------------------------------------

Here some results on the perofrmance of compression-decompression of various algorithms over
the wikipedia dataset (english articles).

First, the performance on compressing/decompressing over the full .doc posting list using
block size of 1024. 
The Compression and Decompression are in KInt/ms and the size in bytes.

||Codec||Compression||Decompression||Size||
|FOR (Orig)|20|106|448856|
|PFOR (Orig)|8|107| 383596|
|VINT|37|74|421764|
|FOR (Opt)|39|102|447369|
|PFOR (Opt)|10|108|382379|
|RICE|8|31|350687|
|S9|16|65|408218|

* VInt is a block based version of variable integer (unlike the simple int block, I am creating
blocks using vint, then read the full block in memory and decompress it using vint one integer
at a time).
* For (Orig) and PFOR (Orig) are your implementation of FOR and PFOR.
* FOR (Opt) and PFOR (Opt) are my implementation of FOR and PFOR (using array of functors
for each compression decompression case).
* Rice is one implementation of the rice algorithm, but block-based.
* S9 is your implementation of simple 9 with some bug fixes, but block-based.

I have created a *IndexInput and *IndexOutput for each algorithm, and use them to compress
and decompress the posting file in this experiment.

We can see that using an array of functors for compression in FOR provides a good speed increase
in compression. However, on PFOR, the compression speed increase is limited. From my test,
it is due to the sort step which is necessary for comrpessing each block. 
PFOR provides a good balance between decompression speed and size, but it is very slow in
compression (it is as slow as Rice). I don't think it is possible to improve the compression
speed due to the sort step, which seems to impose a hard upper limit in term of performance
(I have tried various heuristics to avoid sorting, but not very successful so far).

Following this first benchmark, I have created various Lucene codec that uses the *IndexInput
and *IndexOutput of the different compression algorithms. I have indexed the full wikipedia
dataset. Each block-based codec is configured to use block of 1024. 
I have recorded the average commit time and the optimise time. Then, I have executing random
keyword queries and I have recorded the average query time and the number of bytes read for
answering the query.

h5. Compression

Time is in ms, Size in bytes.

||Codec||Avg Commit Time||Total Commit Time||Optimise Time||Index Size||
|STD|1276|359978|113386|1423251|
|SEP|1437|405286|117163|1666870|
|VINT|1572|426551|122557|1742083|
|RICE|1791|505312|230636|1339703|
|S9|1575|444157|146216|1530666|
|FOR|1520|428847|134502|1754578|
|PFOR|1852|522401|189262|1511154|

* STD is the Standard lucene codec.
* SEP the Sep codec.
* All the other algorithms are based on the Sep codec, but block-based.
* FOR and PFOR is my implementation.

We can see that the sep codec and block-based codecs have a certain overhead (in indexing
time and index size) compared to the standard lucene codec. But, I imagine that this overhead
can be reduced with some code optimisation. For the index size, the overhead in block-based
codeca is due to the skip list, which is much bigger (127MB for SEP against 189MB for block-based)
since we need to encode the block offset, and the inner documet offset in the block.

In term of compression speed and index size, we can see that this results follows the previous
results. We can observe also that PFOR is the slowest.

h5. Decompression

For each codec, I have performed five runs, each run having 200 random keyword queries, and
average the time over the 5 runs and 200 queries (i.e., 200*5 = 1000 query execution).
For each run, I am opening a new IndexReader and IndexSearcher and I am reusing them across
the 200 random queries.

To generate the queries, I first grouped the terms into three group: HIGH, MEDIUM and LOW,
based on their frequency, in order to be able to generate random keyword queries based on
this three frequency groups. Then, I have performed benchmarks with random queries of one,
two or more keywords, using all possible combinations (e.g., HIGH & HIGH, HIGH & MEDIUM,
etc). I have executed the queries using a single thread, and also using multiple threads.

I am reporting below a few of query results, but for most of the other queries, the results
were following the same scheme. I don't report the results of the Standard and Sep codec,
but they always outperform block-based codec, whatever compression used.

However, the results for the query time are completely contradictory with the results of the
first benchmark. The slowest algorithm (Rice, Vint) generally provides the faster query execution
time, and the faster algorithm (FOR, PFOR) provides the slowest query execution time. Simple9
seems to be as fast as VInt. 
I currently have no explanation for that. Maybe the dataset (wikipedia articles) is too small
to see the benefits. But I would say instead there is a problem somewhere. How come VInt and
even Rice could provide better query execution times than FOR and PFOR ? While, in the first
benchmark, it is clear that FOR and PFOR provides nearly 40% of decompression performance
increase.
Do you have some advices on how to improve the benchmark, or some ideas on the causes of this
frop of performance ? I'll be able to re-perform the benchmarks if you propose something.

h6. HIGH:MUST - 1 thread - 200 random queries

||Codec||Avg Query Time||
|Rice|1.5 ms|
|VInt|1.2 ms|
|PFOR|2.3 ms|
|FOR|2.5 ms|
|S9|1 ms|

h6. HIGH:MUST - 2 thread - 200 random queries for each thread

||Codec||Avg Query Time||
|Rice|2 ms|
|VInt|1.9 ms|
|PFOR|3 ms|
|FOR|3.5 ms|
|S9|1.6 ms|

h6. HIGH:MUST HIGH:SHOULD HIGH:SHOULD HIGH:SHOULD - 1 thread - 200 random queries

||Codec||Avg Query Time||
|Rice|1.5 ms|
|VInt|1.2 ms|
|PFOR|2.7 ms|
|FOR|2.5 ms|
|S9|1.3 ms|

h6. HIGH:MUST HIGH:SHOULD HIGH:SHOULD HIGH:SHOULD - 2 threads - 200 random queries for each
thread

|Rice|2.5 ms|
|VInt|2 ms|
|PFOR|3 ms|
|FOR|3.4 ms|
|S9|2 ms|



> PFOR implementation
> -------------------
>
>                 Key: LUCENE-1410
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1410
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Other
>            Reporter: Paul Elschot
>            Priority: Minor
>         Attachments: autogen.tgz, for-summary.txt, LUCENE-1410-codecs.tar.bz2, LUCENE-1410b.patch,
LUCENE-1410c.patch, LUCENE-1410d.patch, LUCENE-1410e.patch, TermQueryTests.tgz, TestPFor2.java,
TestPFor2.java, TestPFor2.java
>
>   Original Estimate: 21840h
>  Remaining Estimate: 21840h
>
> Implementation of Patched Frame of Reference.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message