lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: is this the right way to go?
Date Wed, 09 Jun 2010 22:41:41 GMT
In addition to Ian's comment, an important question is what
kind of values you're sorting on. It sounds like a time stamp,
because most languages only have a (relatively) small number
of terms.

It's not the total terms in the field, it's the total *unique* terms
in the field. So even with a very large text field in, say, English,
there are only around 300,000 terms and that will be the max
that are loaded into memory.

So, as always, details matter <G>.

Best
Erick

On Wed, Jun 9, 2010 at 4:22 PM, fujian <fujian.z.yang@nokia.com> wrote:

>
>
> Hello,
>
> We are using lucene 2.9.0. and ran into OutOfMemory error when sorting on a
> highly unique field on a big index. After doing some research we learned
> that lucene will load the sort field value for all documents into memory to
> do sorting, and ended up with the OutOfMemory if the index is too big.
>
> The problem is no matter how much heap size you allocated for jvm, you will
> run into this problem eventually when your index is big enough and the sort
> field value is unique enough regardless how small amount of documents
> actually match the query ...
>
> This is a bit ... annoying. I only want to sort the 10 docs that match the
> query, why I need to load millions of field values into memory?
>
> Then we are thinking to disable the sort from lucene search and do the
> sorting ourselves. that is, after getting all matching docs from
> search(since we know the matching docs won't be too much), we do the
> sorting
> "manually". This requires us write our own sorting methods and it needs to
> support multiple field sort.
>
> so my question is is this the right way to go? or maybe lucene/solr already
> has the similiar thing?
>
> Thank you very much,
> -Fujian
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/is-this-the-right-way-to-go-tp883464p883464.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message