lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ganesh" <emailg...@yahoo.co.in>
Subject Re: Alternative way to simulate sorting without doing actual sort
Date Thu, 23 Jul 2009 06:01:10 GMT
Hello Eric,

I agree, the number of unique terms might be less, but [ 4 * reader.maxdoc() * different fields
] will increase the memory consumption. I am having 100 million records spread across 10 DB.
4 * 100M is itself 400 MB. If i try to use 2 fields for sorting then it would be 800 MB. The
unique terms may be less but this array itself consume huge memory. 

I am not able to understand you approach of splitting date time would reduce memory consumption.


Regards
Ganesh

----- Original Message ----- 
From: "Erick Erickson" <erickerickson@gmail.com>
To: <java-user@lucene.apache.org>
Sent: Thursday, July 23, 2009 1:55 AM
Subject: Re: Alternative way to simulate sorting without doing actual sort


>I was assuming you were storing things as strings, in which case
> it works something like this:
> Let's say you broke it up into
> YYYY
> MM
> DD
> HH
> MM
> 
> The number of unique terms that need to be kept in
> memory to sort is just (let's say your documents
> span 100 years)
> 100 + 12 + 31 + 24 + 60.
> 
> But that's a much different case.
> 
> On Wed, Jul 22, 2009 at 5:31 AM, Ganesh <emailgane@yahoo.co.in> wrote:
> 
>> Hello Eric,
>>
>> Thanks for your reply.
>>
>> Memory reqd for sorting: 4 * reader.maxdoc()
>> .
>> I am sorting datetime with minute resolution. 100 records are representing
>> a minute then in a 1 million record database, there will be around 20000
>> unique terms. the amount of memory consumed would be 4 * 1000000 + 20000 * 8
>> [Considering date time as Long]
>>
>> The more amount of memory consumed by 4 * reader.maxdoc. If i have two or
>> three fields say  (YYYMMDD, hh, mm) then the amount of memory consumption
>> would be too high. How could you say that splitting the field will help in
>> reducing the memory usage.
>>
>> Please correct me if i am wrong. I require some justification to split the
>> date in to multiple terms.
>>
>> Regards
>> Ganesh
>>
>>
>> ----- Original Message -----
>> From: "Erick Erickson" <erickerickson@gmail.com>
>> To: <java-user@lucene.apache.org>
>> Sent: Tuesday, July 21, 2009 7:29 PM
>> Subject: Re: Alternative way to simulate sorting without doing actual sort
>>
>>
>> > Have you tried splitting your times into separate fields, perhaps one
>> with
>> > YYYYMMDD and another with HHMM, then do a primary sort on the YYYMMDD and
>> > secondary on HHMM. That'll reduce your total unique values greatly and
>> > should improve your memory consumption.
>> > Best
>> > Erick
>> >
>> > On Tue, Jul 21, 2009 at 4:27 AM, Ganesh <emailgane@yahoo.co.in> wrote:
>> >
>> >> Hello all
>> >>
>> >> I am sorting on datetime with minute resolution. It easily reaches the
>> >> maximum heap size. I am having almost 100M records and it is using 1.5
>> GB. I
>> >> am now in a situitation to stop sorting and to find some other
>> alternative
>> >> way.
>> >>
>> >> I tried adding document boost and field boost for date time. document
>> boost
>> >> alone is not working. document boost and field boost has impact on
>> score.
>> >> Search on datetime gives me the sorted datetime results but search on
>> any
>> >> other field didn't works.
>> >>
>> >> I am doing updates and it changes the doc id.. I want to get the results
>> >> sorted by FIRST TIME inserted order. Updates should not disturb the
>> results
>> >> set. I think Solr has some facilities to get the list of recently added
>> >> documents.
>> >>
>> >> Any ideas are greatly appreciated.
>> >>
>> >> Regards
>> >> GaneshSend instant messages to your online friends
>> >> http://in.messenger.yahoo.com
>> >>
>> >> ---------------------------------------------------------------------
>> >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> >> For additional commands, e-mail: java-user-help@lucene.apache.org
>> >>
>> >>
>> >
>> Send instant messages to your online friends http://in.messenger.yahoo.com
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>
Send instant messages to your online friends http://in.messenger.yahoo.com 

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message