lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Carlos González-Cadenas (Commented) (JIRA) <>
Subject [jira] [Commented] (LUCENE-3298) FST has hard limit max size of 2.1 GB
Date Sat, 03 Dec 2011 14:31:40 GMT


Carlos González-Cadenas commented on LUCENE-3298:

Dawid, I wanted to let you know that we've reached the 2GB barrier.

We're using a heavily modified version of FSTLookup to create an autocomplete system over
2.3 billion queries (and growing, it will be more than 10-15B when we add the data for infix

In order to circumvent the 2.1GB limitation, we changed the code so that every bucket uses
a different FST (as per Robert Muir's recommendation), but still we're having problems in
the individual buckets because our dataset is huge.

We'll give a try to this patch and will let you know.


> FST has hard limit max size of 2.1 GB
> -------------------------------------
>                 Key: LUCENE-3298
>                 URL:
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: core/FSTs
>            Reporter: Michael McCandless
>            Priority: Minor
>         Attachments: LUCENE-3298.patch
> The FST uses a single contiguous byte[] under the hood, which in java is indexed by int
so we cannot grow this over Integer.MAX_VALUE.  It also internally encodes references to this
array as vInt.
> We could switch this to a paged byte[] and make the far larger.
> But I think this is low priority... I'm not going to work on it any time soon.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:!default.jspa
For more information on JIRA, see:


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message