lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Carlos González-Cadenas (Commented) (JIRA) <>
Subject [jira] [Commented] (LUCENE-3298) FST has hard limit max size of 2.1 GB
Date Wed, 07 Dec 2011 16:16:41 GMT


Carlos González-Cadenas commented on LUCENE-3298:

Thanks for the presentation. It's very interesting. 

Now that we've invested very significant time with this approach, we'd like to stick a little
bit more with it and see where we can get to. The FST approach, given that is way more low
level, will give us more control of the functionality down the road, which definitely will
prove benefitial mid-term. If needed due to space requirements, we can think of replacing
FST by LZTrie if we need more infix compression for the permutations.

Re: next steps, you commented above that you may consider including this patch into the codebase
when you have people that have the need. We obviously would be very interested in this patch
getting into trunk. 

In terms of performance, James is speaking about a 20% performance loss in a 32-bit machine,
we're seeing less performance degradation in a 64-bit machine, something around 10-15% depending
on the specific FST and query. If you or James envision any way to optimize it, let me know,
we can give a hand here if you tell us the potential paths to make it more efficient.  

> FST has hard limit max size of 2.1 GB
> -------------------------------------
>                 Key: LUCENE-3298
>                 URL:
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: core/FSTs
>            Reporter: Michael McCandless
>            Priority: Minor
>         Attachments: LUCENE-3298.patch
> The FST uses a single contiguous byte[] under the hood, which in java is indexed by int
so we cannot grow this over Integer.MAX_VALUE.  It also internally encodes references to this
array as vInt.
> We could switch this to a paged byte[] and make the far larger.
> But I think this is low priority... I'm not going to work on it any time soon.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:!default.jspa
For more information on JIRA, see:


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message