lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Carlos González-Cadenas (Commented) (JIRA) <>
Subject [jira] [Commented] (LUCENE-3298) FST has hard limit max size of 2.1 GB
Date Wed, 07 Dec 2011 19:22:40 GMT


Carlos González-Cadenas commented on LUCENE-3298:

Hi Sudarshan,

I don't believe that my implementation is gonna be of much practical value for the general
public. Note that, as described above, in my implementation I store custom data that is useful
for my application, but it almost certainly won't make any sense for the rest of applications.

I'm happy to tell you how to modify the code to store your own outputs, it's quite easy: 
1) First you have to enable it at the code level, you just need to change NoOutputs by ByteSequenceOutputs
and then in all the references of Arc<Object> or FST<Object> you need to change
them by Arc<BytesRef> and FST<BytesRef>. 
2) At build time, you need to store something in the output. You can do it by creating the
appropriate BytesRef and including it in the builder.add() call instead of the placeholder
value that is present now.
3) At query time, you need to collect the output while traversing the FST (note that the output
may be scattered through the whole arc chain) and then you can process it in the way specific
to your app. Probably you want to do it in the collect() method (when the LookupResults are

I believe that's all. If you have any questions, let me know.

> FST has hard limit max size of 2.1 GB
> -------------------------------------
>                 Key: LUCENE-3298
>                 URL:
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: core/FSTs
>            Reporter: Michael McCandless
>            Priority: Minor
>         Attachments: LUCENE-3298.patch
> The FST uses a single contiguous byte[] under the hood, which in java is indexed by int
so we cannot grow this over Integer.MAX_VALUE.  It also internally encodes references to this
array as vInt.
> We could switch this to a paged byte[] and make the far larger.
> But I think this is low priority... I'm not going to work on it any time soon.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:!default.jspa
For more information on JIRA, see:


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message