lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: Alternative way to simulate sorting without doing actual sort
Date Wed, 22 Jul 2009 20:25:34 GMT
I was assuming you were storing things as strings, in which case
it works something like this:
Let's say you broke it up into
YYYY
MM
DD
HH
MM

The number of unique terms that need to be kept in
memory to sort is just (let's say your documents
span 100 years)
100 + 12 + 31 + 24 + 60.

But that's a much different case.

On Wed, Jul 22, 2009 at 5:31 AM, Ganesh <emailgane@yahoo.co.in> wrote:

> Hello Eric,
>
> Thanks for your reply.
>
> Memory reqd for sorting: 4 * reader.maxdoc()
> .
> I am sorting datetime with minute resolution. 100 records are representing
> a minute then in a 1 million record database, there will be around 20000
> unique terms. the amount of memory consumed would be 4 * 1000000 + 20000 * 8
> [Considering date time as Long]
>
> The more amount of memory consumed by 4 * reader.maxdoc. If i have two or
> three fields say  (YYYMMDD, hh, mm) then the amount of memory consumption
> would be too high. How could you say that splitting the field will help in
> reducing the memory usage.
>
> Please correct me if i am wrong. I require some justification to split the
> date in to multiple terms.
>
> Regards
> Ganesh
>
>
> ----- Original Message -----
> From: "Erick Erickson" <erickerickson@gmail.com>
> To: <java-user@lucene.apache.org>
> Sent: Tuesday, July 21, 2009 7:29 PM
> Subject: Re: Alternative way to simulate sorting without doing actual sort
>
>
> > Have you tried splitting your times into separate fields, perhaps one
> with
> > YYYYMMDD and another with HHMM, then do a primary sort on the YYYMMDD and
> > secondary on HHMM. That'll reduce your total unique values greatly and
> > should improve your memory consumption.
> > Best
> > Erick
> >
> > On Tue, Jul 21, 2009 at 4:27 AM, Ganesh <emailgane@yahoo.co.in> wrote:
> >
> >> Hello all
> >>
> >> I am sorting on datetime with minute resolution. It easily reaches the
> >> maximum heap size. I am having almost 100M records and it is using 1.5
> GB. I
> >> am now in a situitation to stop sorting and to find some other
> alternative
> >> way.
> >>
> >> I tried adding document boost and field boost for date time. document
> boost
> >> alone is not working. document boost and field boost has impact on
> score.
> >> Search on datetime gives me the sorted datetime results but search on
> any
> >> other field didn't works.
> >>
> >> I am doing updates and it changes the doc id.. I want to get the results
> >> sorted by FIRST TIME inserted order. Updates should not disturb the
> results
> >> set. I think Solr has some facilities to get the list of recently added
> >> documents.
> >>
> >> Any ideas are greatly appreciated.
> >>
> >> Regards
> >> GaneshSend instant messages to your online friends
> >> http://in.messenger.yahoo.com
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >> For additional commands, e-mail: java-user-help@lucene.apache.org
> >>
> >>
> >
> Send instant messages to your online friends http://in.messenger.yahoo.com
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message