lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ganesh" <emailg...@yahoo.co.in>
Subject Re: Multisearcher will maintain index order sorting?
Date Thu, 23 Oct 2008 10:15:05 GMT
Multisearcher after performing search on second index, adds the resultant 
docid with the maxdocid of the first index. In my case it would be 3. After 
incrementing the docid, the document is inserted in to the 
FieldDocSortedHitQueue. FieldDocSortedHitQueue is an extension of priority 
queue should sort in the increasing order. It should insert docid 3 after 2 
and not after 0.

code snippet of MultiSearcher.Java
--------------------------------------
if (hq == null) hq = new FieldDocSortedHitQueue (docs.fields, n);
.....
for (int j = 0; j < scoreDocs.length; j++) { // merge scoreDocs into hq
        ScoreDoc scoreDoc = scoreDocs[j];
        scoreDoc.doc += starts[i];                //Doc id is 
incremented*******
        if (!hq.insert (scoreDoc))                  //Insertion should do 
automatic sorting
          break;                                        }

Regards
Ganesh


----- Original Message ----- 
From: "Hadi Forghani" <hadi4i@gmail.com>
To: <java-user@lucene.apache.org>
Sent: Thursday, October 23, 2008 3:25 PM
Subject: Re: Multisearcher will maintain index order sorting?


> because when you want to find X of second index, shoud pass docId=3 to
> MultiSearcher and MultiSearcher can Find Sub Search of this Document with
> calculation length of all subSearcher.
> for example when you get doc with DocID 3(Second X), multisearch (see the
> code of multisearcher doc(int i)), mines 3 from your DocID(because the 
> first
> Searcher has 3 documents) and then pass zero to second Searcher and want 
> to
> return 0 doc from it.
> on the other hand, multisearcher find subsearcher by BinarySearchTree no
> just that is said.
>
> On Thu, Oct 23, 2008 at 12:47 PM, Ganesh <emailgane@yahoo.co.in> wrote:
>
>> In IndexA there are 3 docs
>> DocID, Terms
>> 0,X
>> 1,X Y
>> 2,X Z
>>
>> In IndexB there are 3 docs
>> DocID, Terms
>> 0,X
>> 1,X Y
>> 2,X Z
>>
>> When i do sort on indexed order using Multisearcher and
>> ParallelMultiSearcher, it returns the result
>> 0,X
>> 3,X
>> 1,X Y
>> 4,X Y
>> 2,X Z
>> 5,X Z
>>
>> But it should be in the order of 0,1,2,3,4,5. Could anyone explain why?
>>
>> Regards
>> Ganesh
>>
>> ----- Original Message ----- From: "Ganesh" <emailgane@yahoo.co.in>
>> To: <java-user@lucene.apache.org>
>> Sent: Thursday, October 23, 2008 1:37 PM
>>
>> Subject: Re: Multisearcher will maintain index order sorting?
>>
>>
>>  Multisearcher and ParallelMultiSearcher, when requested to sort on doc
>>> (indexed order), it merges the result by docID of each DB.
>>>
>>> Regards
>>> Ganesh
>>>
>>> ----- Original Message ----- From: "Paul Smith" <psmith@aconex.com>
>>> To: <java-user@lucene.apache.org>
>>> Sent: Thursday, October 23, 2008 10:57 AM
>>> Subject: Re: Multisearcher will maintain index order sorting?
>>>
>>>
>>>
>>>> On 23/10/2008, at 4:20 PM, Ganesh wrote:
>>>>
>>>>  My Index DB is having 10 million records and it will grow to 30 
>>>> million.
>>>>> Currently I am using millisecond timestamp and the RAM cosumption is

>>>>> more. I
>>>>> will change the resolution to minute. I am  using 2 searcher objects
>>>>> refreshing each other every minute. When i  do a warmup query with 
>>>>> sort of
>>>>> timestamp then the cpu is spiked to  100% and this is happening for 
>>>>> every
>>>>> minute.  In order to avoid  these issues, i am planning to break my DB

>>>>> and
>>>>> to do sort on indexed  order.
>>>>>
>>>>> Will multisearcher will maintain indexed order on sorting?
>>>>>
>>>>
>>>>
>>>> If you need to keep the millisecond accuracy, break down the timestamp
>>>> into 3 fields: day, time, millisecond, and sort on 3 fields.  This way 
>>>> each
>>>> field has a much smaller number of distinct values and well  occupy 
>>>> vastly
>>>> less memory over time.  I don't think there's much  overhead in this
>>>> approach either, because in most cases, the top-level  field (day) will
>>>> provide most of the sorting ability, and Lucene will  only need to hit 
>>>> the
>>>> time & millisecond fields less frequently for  comparison.
>>>>
>>>> I believe Multisearcher does a merge sort of the 2 (or more) sub-
>>>> searchers, so there is an overhead in using in versus a single 
>>>> searcher.
>>>>
>>>> Paul
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>
>>>>
>>> Send instant messages to your online friends
>>> http://in.messenger.yahoo.com
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>>
>> Send instant messages to your online friends 
>> http://in.messenger.yahoo.com
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
> 

Send instant messages to your online friends http://in.messenger.yahoo.com 

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message