lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dawid Weiss (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LUCENE-3298) FST has hard limit max size of 2.1 GB
Date Tue, 06 Dec 2011 08:15:40 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-3298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13163425#comment-13163425
] 

Dawid Weiss commented on LUCENE-3298:
-------------------------------------

bq. time required for autocomplete to work from a user perspective is very low (<50ms),

I've seen a presentation by Greg Donovan at Lucene Revolution San Francisco, here:
http://www.lucidimagination.com/devzone/events/conferences/revolution/2011/solr-and-lucene-etsy

they seem to have a lot of traffic and still use a shard of Solr servers to do contextual
suggestions. Maybe it'd be easier to buy more hardware than try to squeeze something into
an FST that it doesn't handle well (permutations). Just a thought.

bq.  hotels in barcelona => hotels in barcelona, The FST should be able to conflate these
prefixes nicely in just one path, right?.

The FST will be able to conflate suffixes if you have no outputs. If you do have different
outputs these need to be stored somewhere too; for outputs with a common part, the common
part is pushed towards the root of the FST, but for byte sequences this is unlikely so the
output will actually have to differentiate these paths somehow and create at least a single
separate node/arc in the FST. I tell this without checking, but this is my intuition -- verify
in practice if you want to.



                
> FST has hard limit max size of 2.1 GB
> -------------------------------------
>
>                 Key: LUCENE-3298
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3298
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: core/FSTs
>            Reporter: Michael McCandless
>            Priority: Minor
>         Attachments: LUCENE-3298.patch
>
>
> The FST uses a single contiguous byte[] under the hood, which in java is indexed by int
so we cannot grow this over Integer.MAX_VALUE.  It also internally encodes references to this
array as vInt.
> We could switch this to a paged byte[] and make the far larger.
> But I think this is low priority... I'm not going to work on it any time soon.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message