lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ian Lea <ian....@gmail.com>
Subject Re: Alternative way to simulate sorting without doing actual sort
Date Wed, 22 Jul 2009 11:29:57 GMT
However you do it, it seems to me that you're going to need loads of
memory if you want lucene to do this type of sorting on indexes with
100 million docs.  Can't you just buy or allocate some more memory?

One alternative would be to do the sorting yourself once you've got a
list of hits. Trade some memory for CPU and IO. Might make things
slower but that might be acceptable.  If your hit lists are small
might not even be much slower.  You could cache the sort values to try
and speed it up but of course that will use memory - but you can be as
clever as you like with the cache and null/sparse/whatever values.


--
Ian.


On Wed, Jul 22, 2009 at 10:31 AM, Ganesh<emailgane@yahoo.co.in> wrote:
> Hello Eric,
>
> Thanks for your reply.
>
> Memory reqd for sorting: 4 * reader.maxdoc()
> .
> I am sorting datetime with minute resolution. 100 records are representing a minute then
in a 1 million record database, there will be around 20000 unique terms. the amount of memory
consumed would be 4 * 1000000 + 20000 * 8 [Considering date time as Long]
>
> The more amount of memory consumed by 4 * reader.maxdoc. If i have two or three fields
say  (YYYMMDD, hh, mm) then the amount of memory consumption would be too high. How could
you say that splitting the field will help in reducing the memory usage.
>
> Please correct me if i am wrong. I require some justification to split the date in to
multiple terms.
>
> Regards
> Ganesh
>
>
> ----- Original Message -----
> From: "Erick Erickson" <erickerickson@gmail.com>
> To: <java-user@lucene.apache.org>
> Sent: Tuesday, July 21, 2009 7:29 PM
> Subject: Re: Alternative way to simulate sorting without doing actual sort
>
>
>> Have you tried splitting your times into separate fields, perhaps one with
>> YYYYMMDD and another with HHMM, then do a primary sort on the YYYMMDD and
>> secondary on HHMM. That'll reduce your total unique values greatly and
>> should improve your memory consumption.
>> Best
>> Erick
>>
>> On Tue, Jul 21, 2009 at 4:27 AM, Ganesh <emailgane@yahoo.co.in> wrote:
>>
>>> Hello all
>>>
>>> I am sorting on datetime with minute resolution. It easily reaches the
>>> maximum heap size. I am having almost 100M records and it is using 1.5 GB. I
>>> am now in a situitation to stop sorting and to find some other alternative
>>> way.
>>>
>>> I tried adding document boost and field boost for date time. document boost
>>> alone is not working. document boost and field boost has impact on score.
>>> Search on datetime gives me the sorted datetime results but search on any
>>> other field didn't works.
>>>
>>> I am doing updates and it changes the doc id.. I want to get the results
>>> sorted by FIRST TIME inserted order. Updates should not disturb the results
>>> set. I think Solr has some facilities to get the list of recently added
>>> documents.
>>>
>>> Any ideas are greatly appreciated.
>>>
>>> Regards
>>> GaneshSend instant messages to your online friends
>>> http://in.messenger.yahoo.com
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>>
>>
> Send instant messages to your online friends http://in.messenger.yahoo.com
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message