lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael McCandless (JIRA)" <>
Subject [jira] Commented: (LUCENE-1410) PFOR implementation
Date Fri, 10 Oct 2008 14:57:44 GMT


Michael McCandless commented on LUCENE-1410:

Another thing that bit me was the bufferByteSize(): if this returns
something that's not 0 mod 4, you must increase it to the next
multiple of 4 otherwise you will lose data since ByteBuffer is big
endian by default.  We should test little endian to see if performance
changes (on different CPUs).

bq. Did you also move to relative addressing in the buffer? 

No I haven't done that, but I think we should.  I believe it's faster.  I'm trying now to
get a rudimentary test working for TermQuery using pfor.

Another question: I suppose the place to add this initially would be in IndexOutput and IndexInput?
In that case it would make sense to reserve (some bits of) the first byte in the compressed
for the compression method, and use these bits there to call PFor or another (de)compression

This gets into flexible indexing...

Ideally we do this in a pluggable way, so that PFor is just one such
plugin, simple vInts is another, etc.

I could see a compression layer living "above" IndexInput/Output,
since logically how you encode an int block into bytes is independent
from the means of storage.

But: such an abstraction may hurt performance too much since during
read it would entail an extra buffer copy.  So maybe we should just
add methods to IndexInput/Output, or, make a new

Also, some things you now store in the header of each block should
presumably move to the start of the file instead (eg the compression
method), or if we move to a separate "schema" file that can record
which compressor was used per file, we'd put this there.

So I'm not yet exactly sure how we should tie this in "for real"...

> PFOR implementation
> -------------------
>                 Key: LUCENE-1410
>                 URL:
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Other
>            Reporter: Paul Elschot
>            Priority: Minor
>         Attachments: autogen.tgz, LUCENE-1410b.patch, LUCENE-1410c.patch,,,
>   Original Estimate: 21840h
>  Remaining Estimate: 21840h
> Implementation of Patched Frame of Reference.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message