lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "James Dyer (Commented) (JIRA)" <>
Subject [jira] [Commented] (LUCENE-3298) FST has hard limit max size of 2.1 GB
Date Wed, 07 Dec 2011 17:16:40 GMT


James Dyer commented on LUCENE-3298:


I'm not sure how much help this is, but you might be able to eke a little bit of performance
if you can tighten RewritablePagedBytes.copyBytes().  You'll note it currently moves the From-Bytes
into a temp array then writes that back to the fst an the To-Bytes location.  Note also, the
one place this gets called, it used to be a simple "System.ArrayCopy".  So if you can make
it copy in-place that might claw back the performance loss a little.  Beyond this, a different
pair of eyes might find more ways to optimize.  In the end though you will likely never make
it perform quite as well as the simple array.

Also, it sounds as if you've maybe done work to sync this with the current trunk.  If so,
would you mind uploading the updated patch?

Also if you end up using this, be sure to test thoroughly.  I implemented this one just to
gain a little familiarity with the code and I do not claim any sort of expertise in this area,
so beware!  But all of the regular unit tests did pass for me.  I was meaning to try to run
test2bpostings against this but wasn't able to get it set up.  If I remember this issue came
up originally because someone wanted to run test2bpostings with memorycodec and it was going
passed the limit.
> FST has hard limit max size of 2.1 GB
> -------------------------------------
>                 Key: LUCENE-3298
>                 URL:
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: core/FSTs
>            Reporter: Michael McCandless
>            Priority: Minor
>         Attachments: LUCENE-3298.patch
> The FST uses a single contiguous byte[] under the hood, which in java is indexed by int
so we cannot grow this over Integer.MAX_VALUE.  It also internally encodes references to this
array as vInt.
> We could switch this to a paged byte[] and make the far larger.
> But I think this is low priority... I'm not going to work on it any time soon.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:!default.jspa
For more information on JIRA, see:


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message