lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ganesh" <emailg...@yahoo.co.in>
Subject Re: Multisearcher will maintain index order sorting?
Date Thu, 23 Oct 2008 09:17:20 GMT
In IndexA there are 3 docs
DocID, Terms
0,X
1,X Y
2,X Z

In IndexB there are 3 docs
DocID, Terms
0,X
1,X Y
2,X Z

When i do sort on indexed order using Multisearcher and 
ParallelMultiSearcher, it returns the result
0,X
3,X
1,X Y
4,X Y
2,X Z
5,X Z

But it should be in the order of 0,1,2,3,4,5. Could anyone explain why?

Regards
Ganesh

----- Original Message ----- 
From: "Ganesh" <emailgane@yahoo.co.in>
To: <java-user@lucene.apache.org>
Sent: Thursday, October 23, 2008 1:37 PM
Subject: Re: Multisearcher will maintain index order sorting?


> Multisearcher and ParallelMultiSearcher, when requested to sort on doc 
> (indexed order), it merges the result by docID of each DB.
>
> Regards
> Ganesh
>
> ----- Original Message ----- 
> From: "Paul Smith" <psmith@aconex.com>
> To: <java-user@lucene.apache.org>
> Sent: Thursday, October 23, 2008 10:57 AM
> Subject: Re: Multisearcher will maintain index order sorting?
>
>
>>
>> On 23/10/2008, at 4:20 PM, Ganesh wrote:
>>
>>> My Index DB is having 10 million records and it will grow to 30 
>>> million. Currently I am using millisecond timestamp and the RAM 
>>> cosumption is more. I will change the resolution to minute. I am  using 
>>> 2 searcher objects refreshing each other every minute. When i  do a 
>>> warmup query with sort of timestamp then the cpu is spiked to  100% and 
>>> this is happening for every minute.  In order to avoid  these issues, i 
>>> am planning to break my DB and to do sort on indexed  order.
>>>
>>> Will multisearcher will maintain indexed order on sorting?
>>
>>
>> If you need to keep the millisecond accuracy, break down the timestamp 
>> into 3 fields: day, time, millisecond, and sort on 3 fields.  This way 
>> each field has a much smaller number of distinct values and well  occupy 
>> vastly less memory over time.  I don't think there's much  overhead in 
>> this approach either, because in most cases, the top-level  field (day) 
>> will provide most of the sorting ability, and Lucene will  only need to 
>> hit the time & millisecond fields less frequently for  comparison.
>>
>> I believe Multisearcher does a merge sort of the 2 (or more) sub- 
>> searchers, so there is an overhead in using in versus a single searcher.
>>
>> Paul
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>
> Send instant messages to your online friends http://in.messenger.yahoo.com
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 

Send instant messages to your online friends http://in.messenger.yahoo.com 

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message