lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Albert Vila <...@imente.com>
Subject Re: Clustering question: searching two diferent indexes
Date Wed, 23 Jun 2004 08:23:12 GMT
By 'order', I mean that I'm adding the documents in the big index sorted 
by date (in order to increase the sorting process). I wanna preserve 
this sorting after the merging process.

I'm not using the internal lucene ID in the code field. The code field 
contains my own IDs. I was asking, if I can do the merge using my own 
IDs (the code field), and not the lucene internal IDs, for example:

luceneID_0, code_x, title_x, content_x, language_x, date_x
luceneID_1, code_y, title_y, content_y, language_y, date_y

luceneID_0, code_y, cluster_y
luceneID_1, code_x, cluster_x

Will the prevous index structure procude an unconsistent merged index?

I wanna achieve the following merged index:
luceneID_0, code_x, title_x, content_x, language_x, date_x, cluster_x
luceneID_1, code_y, title_y, content_y, language_y, date_y, cluster_y

Thanks

Otis Gospodnetic wrote:

>Albert,
>
>--- Albert Vila <avp@imente.com> wrote:
>  
>
>>Thanks Otis, but I can merge two indexes with different fields?
>>    
>>
>
>Yes.  Documents with different Fields can be stored in the same index.
>Not every Document has to have all fields, and it can even have a
>completely different set of Fields.
>
>  
>
>>My big index has this fields, code, title, content, language and
>>date. I add the new documents incrementally.
>>
>>The clustering index only contains the fields code, and cluster.
>>Merging 
>>the big index with the clustering one will preserve the order of the
>>big one?
>>    
>>
>
>I don't fully understand what you mean by 'order'.  If you are asking
>whether internal document Ids will remain the same, the answer is
>negative.  If you have deleted some documents, there will be gaps in
>document Id sequence, which Lucene will fill, thus re-assigning
>internal document Ids.
>
>  
>
>>For example, if I have the following indexes:
>>Big index
>>code_1, title_1, content_1, language_1, date_1
>>code_2, title_2, content_2, language_2, date_2
>>....
>>
>>Clustering index
>>code_1, cluster_1
>>code_2, cluster_2
>>....
>>
>>then the new merged index will be:
>>
>>Merged index
>>code_1, title_1, content_1, language_1, date_1, cluster_1
>>code_2, title_2, content_2, language_2, date_2, cluster_2
>>....
>>
>>If I can do that then fine, but I think the merging process uses the 
>>lucene internal ID to match the documents. I wanna use the code field
>>to 
>>do that matching, is that possible?. I cannot be sure the lucene 
>>internal ID's are the same for the same codes in both indexes.
>>    
>>
>
>Are you storing the internal Lucene Document Id in the 'code' field? 
>If you are, I suggest you change your application to use its own set of
>unique Ids to serve as 'primary keys' in your indices.
>
>Otis
>
>
>  
>
>>Thanks again,
>>
>>Albert
>>
>>
>>Otis Gospodnetic wrote:
>>
>>    
>>
>>>(re-directing to lucene-user list)
>>>
>>>Albert,
>>>
>>>If I understand your question correctly... You could run a query
>>>      
>>>
>>like
>>    
>>
>>>the one you gave on both indices, but if one of them contains
>>>      
>>>
>>documents
>>    
>>
>>>that have only one of those fields (cluster), then there will never
>>>      
>>>
>>be
>>    
>>
>>>any matches in the second index.
>>>
>>>However, why not leave your big index along, add documents to a new,
>>>smaller index, and then merge them periodically.  I may be off with
>>>this; it sounds like this is what you want to do, but I'm not
>>>      
>>>
>>certain I
>>    
>>
>>>understood you fully.
>>>
>>>Otis
>>>
>>>--- Albert Vila <avp@imente.com> wrote:
>>> 
>>>
>>>      
>>>
>>>>Hi all,
>>>>
>>>>I was wondering If I can search using the MultiSearcher over two 
>>>>diferent indexes at the same time (with diferent fields).
>>>>I've got one big index, with the code, title, content, language,
>>>>        
>>>>
>>etc 
>>    
>>
>>>>fields (new documents are added incrementally). Now, I have to
>>>>introduce 
>>>>a clustering field. The problem is that I have to update the whole
>>>>index 
>>>>each time the clusters change, and I have no enought time to do it
>>>>        
>>>>
>>(I
>>    
>>
>>>>wanna check for new clusters every 10 minuts and I spent 25 minutes
>>>>to 
>>>>reindex the whole index).
>>>>A query example could be: language:0 and title:java and cluster:0
>>>>
>>>>Can I leave the big index whitout any changes and create a new
>>>>        
>>>>
>>index 
>>    
>>
>>>>with only the following fields, code and cluster, and perform the 
>>>>searches using this two indexes? I think I cannot do that without 
>>>>changing the code. It would need a postprocess, matching all
>>>>returning 
>>>>codes from index 1 with index 2.
>>>>
>>>>Anyone have a solution for this problem? I would appreciate that.
>>>>   
>>>>
>>>>        
>>>>
>>>
>>> 
>>>
>>>      
>>>
>>-- 
>>Albert Vila
>>Director de proyectos I+D
>>http://www.imente.com
>>902 933 242
>>[iMente “La información con más beneficios”]
>>
>>
>>---------------------------------------------------------------------
>>To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>>For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>>
>>
>>    
>>
>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>
>
>
>  
>

-- 
Albert Vila
Director de proyectos I+D
http://www.imente.com
902 933 242
[iMente “La información con más beneficios”]


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message