lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ian Lea <ian....@gmail.com>
Subject Re: Alternative way to simulate sorting without doing actual sort
Date Thu, 23 Jul 2009 09:27:50 GMT
Another idea - instead of storing YYYYMMDDhhmm, as longs, store the
value as number of minutes since some start time, as integers.  If my
sums are correct it should cope with several thousand years, and
sorting on integers should use less memory than sorting on longs.


--
Ian.


On Thu, Jul 23, 2009 at 7:01 AM, Ganesh<emailgane@yahoo.co.in> wrote:
> Hello Eric,
>
> I agree, the number of unique terms might be less, but [ 4 * reader.maxdoc() * different
fields ] will increase the memory consumption. I am having 100 million records spread across
10 DB. 4 * 100M is itself 400 MB. If i try to use 2 fields for sorting then it would be 800
MB. The unique terms may be less but this array itself consume huge memory.
>
> I am not able to understand you approach of splitting date time would reduce memory consumption.
>
> Regards
> Ganesh
>
> ----- Original Message -----
> From: "Erick Erickson" <erickerickson@gmail.com>
> To: <java-user@lucene.apache.org>
> Sent: Thursday, July 23, 2009 1:55 AM
> Subject: Re: Alternative way to simulate sorting without doing actual sort
>
>
>>I was assuming you were storing things as strings, in which case
>> it works something like this:
>> Let's say you broke it up into
>> YYYY
>> MM
>> DD
>> HH
>> MM
>>
>> The number of unique terms that need to be kept in
>> memory to sort is just (let's say your documents
>> span 100 years)
>> 100 + 12 + 31 + 24 + 60.
>>
>> But that's a much different case.
>>
>> On Wed, Jul 22, 2009 at 5:31 AM, Ganesh <emailgane@yahoo.co.in> wrote:
>>
>>> Hello Eric,
>>>
>>> Thanks for your reply.
>>>
>>> Memory reqd for sorting: 4 * reader.maxdoc()
>>> .
>>> I am sorting datetime with minute resolution. 100 records are representing
>>> a minute then in a 1 million record database, there will be around 20000
>>> unique terms. the amount of memory consumed would be 4 * 1000000 + 20000 * 8
>>> [Considering date time as Long]
>>>
>>> The more amount of memory consumed by 4 * reader.maxdoc. If i have two or
>>> three fields say  (YYYMMDD, hh, mm) then the amount of memory consumption
>>> would be too high. How could you say that splitting the field will help in
>>> reducing the memory usage.
>>>
>>> Please correct me if i am wrong. I require some justification to split the
>>> date in to multiple terms.
>>>
>>> Regards
>>> Ganesh
>>>
>>>
>>> ----- Original Message -----
>>> From: "Erick Erickson" <erickerickson@gmail.com>
>>> To: <java-user@lucene.apache.org>
>>> Sent: Tuesday, July 21, 2009 7:29 PM
>>> Subject: Re: Alternative way to simulate sorting without doing actual sort
>>>
>>>
>>> > Have you tried splitting your times into separate fields, perhaps one
>>> with
>>> > YYYYMMDD and another with HHMM, then do a primary sort on the YYYMMDD and
>>> > secondary on HHMM. That'll reduce your total unique values greatly and
>>> > should improve your memory consumption.
>>> > Best
>>> > Erick
>>> >
>>> > On Tue, Jul 21, 2009 at 4:27 AM, Ganesh <emailgane@yahoo.co.in> wrote:
>>> >
>>> >> Hello all
>>> >>
>>> >> I am sorting on datetime with minute resolution. It easily reaches the
>>> >> maximum heap size. I am having almost 100M records and it is using 1.5
>>> GB. I
>>> >> am now in a situitation to stop sorting and to find some other
>>> alternative
>>> >> way.
>>> >>
>>> >> I tried adding document boost and field boost for date time. document
>>> boost
>>> >> alone is not working. document boost and field boost has impact on
>>> score.
>>> >> Search on datetime gives me the sorted datetime results but search on
>>> any
>>> >> other field didn't works.
>>> >>
>>> >> I am doing updates and it changes the doc id.. I want to get the results
>>> >> sorted by FIRST TIME inserted order. Updates should not disturb the
>>> results
>>> >> set. I think Solr has some facilities to get the list of recently added
>>> >> documents.
>>> >>
>>> >> Any ideas are greatly appreciated.
>>> >>
>>> >> Regards
>>> >> GaneshSend instant messages to your online friends
>>> >> http://in.messenger.yahoo.com
>>> >>
>>> >> ---------------------------------------------------------------------
>>> >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> >> For additional commands, e-mail: java-user-help@lucene.apache.org
>>> >>
>>> >>
>>> >
>>> Send instant messages to your online friends http://in.messenger.yahoo.com
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>>
>>
> Send instant messages to your online friends http://in.messenger.yahoo.com
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message