lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Albert Vila <...@imente.com>
Subject Re: Clustering question: searching two diferent indexes
Date Wed, 23 Jun 2004 09:17:44 GMT
OK, but with this solution, i cannot perform queries like:
get all codes that match "title:java and language:english and cluster:0"

Albert


Otis Gospodnetic wrote:

>Aha, now I see what you mean.  You didn't mention 'date' before. :)
>So, dates will get preserved, and you will be able to keep using them
>for sorting.  However, Lucene will not automatically recognize your 'PK
>fields' and merge fields from two Documents with the same PK into a
>single Document.  You can think of 'merge' as 'add' (well, the method
>name is addIndices, actually :)), so Lucene will simply make a
>cumulative index from your two separate indices:
>
>luceneID_0, code_x, title_x, content_x, language_x, date_x
>luceneID_1, code_y, title_y, content_y, language_y, date_y
>luceneID_0, code_y, cluster_y
>luceneID_1, code_x, cluster_x
>
>Otis
>
>
>--- Albert Vila <avp@imente.com> wrote:
>  
>
>>By 'order', I mean that I'm adding the documents in the big index
>>sorted 
>>by date (in order to increase the sorting process). I wanna preserve 
>>this sorting after the merging process.
>>
>>I'm not using the internal lucene ID in the code field. The code
>>field 
>>contains my own IDs. I was asking, if I can do the merge using my own
>>
>>IDs (the code field), and not the lucene internal IDs, for example:
>>
>>luceneID_0, code_x, title_x, content_x, language_x, date_x
>>luceneID_1, code_y, title_y, content_y, language_y, date_y
>>
>>luceneID_0, code_y, cluster_y
>>luceneID_1, code_x, cluster_x
>>
>>Will the prevous index structure procude an unconsistent merged
>>index?
>>
>>I wanna achieve the following merged index:
>>luceneID_0, code_x, title_x, content_x, language_x, date_x, cluster_x
>>luceneID_1, code_y, title_y, content_y, language_y, date_y, cluster_y
>>
>>Thanks
>>
>>Otis Gospodnetic wrote:
>>
>>    
>>
>>>Albert,
>>>
>>>--- Albert Vila <avp@imente.com> wrote:
>>> 
>>>
>>>      
>>>
>>>>Thanks Otis, but I can merge two indexes with different fields?
>>>>   
>>>>
>>>>        
>>>>
>>>Yes.  Documents with different Fields can be stored in the same
>>>      
>>>
>>index.
>>    
>>
>>>Not every Document has to have all fields, and it can even have a
>>>completely different set of Fields.
>>>
>>> 
>>>
>>>      
>>>
>>>>My big index has this fields, code, title, content, language and
>>>>date. I add the new documents incrementally.
>>>>
>>>>The clustering index only contains the fields code, and cluster.
>>>>Merging 
>>>>the big index with the clustering one will preserve the order of
>>>>        
>>>>
>>the
>>    
>>
>>>>big one?
>>>>   
>>>>
>>>>        
>>>>
>>>I don't fully understand what you mean by 'order'.  If you are
>>>      
>>>
>>asking
>>    
>>
>>>whether internal document Ids will remain the same, the answer is
>>>negative.  If you have deleted some documents, there will be gaps in
>>>document Id sequence, which Lucene will fill, thus re-assigning
>>>internal document Ids.
>>>
>>> 
>>>
>>>      
>>>
>>>>For example, if I have the following indexes:
>>>>Big index
>>>>code_1, title_1, content_1, language_1, date_1
>>>>code_2, title_2, content_2, language_2, date_2
>>>>....
>>>>
>>>>Clustering index
>>>>code_1, cluster_1
>>>>code_2, cluster_2
>>>>....
>>>>
>>>>then the new merged index will be:
>>>>
>>>>Merged index
>>>>code_1, title_1, content_1, language_1, date_1, cluster_1
>>>>code_2, title_2, content_2, language_2, date_2, cluster_2
>>>>....
>>>>
>>>>If I can do that then fine, but I think the merging process uses
>>>>        
>>>>
>>the 
>>    
>>
>>>>lucene internal ID to match the documents. I wanna use the code
>>>>        
>>>>
>>field
>>    
>>
>>>>to 
>>>>do that matching, is that possible?. I cannot be sure the lucene 
>>>>internal ID's are the same for the same codes in both indexes.
>>>>   
>>>>
>>>>        
>>>>
>>>Are you storing the internal Lucene Document Id in the 'code' field?
>>>      
>>>
>>>If you are, I suggest you change your application to use its own set
>>>      
>>>
>>of
>>    
>>
>>>unique Ids to serve as 'primary keys' in your indices.
>>>
>>>Otis
>>>
>>>
>>> 
>>>
>>>      
>>>
>>>>Thanks again,
>>>>
>>>>Albert
>>>>
>>>>
>>>>Otis Gospodnetic wrote:
>>>>
>>>>   
>>>>
>>>>        
>>>>
>>>>>(re-directing to lucene-user list)
>>>>>
>>>>>Albert,
>>>>>
>>>>>If I understand your question correctly... You could run a query
>>>>>     
>>>>>
>>>>>          
>>>>>
>>>>like
>>>>   
>>>>
>>>>        
>>>>
>>>>>the one you gave on both indices, but if one of them contains
>>>>>     
>>>>>
>>>>>          
>>>>>
>>>>documents
>>>>   
>>>>
>>>>        
>>>>
>>>>>that have only one of those fields (cluster), then there will
>>>>>          
>>>>>
>>never
>>    
>>
>>>>>     
>>>>>
>>>>>          
>>>>>
>>>>be
>>>>   
>>>>
>>>>        
>>>>
>>>>>any matches in the second index.
>>>>>
>>>>>However, why not leave your big index along, add documents to a
>>>>>          
>>>>>
>>new,
>>    
>>
>>>>>smaller index, and then merge them periodically.  I may be off
>>>>>          
>>>>>
>>with
>>    
>>
>>>>>this; it sounds like this is what you want to do, but I'm not
>>>>>     
>>>>>
>>>>>          
>>>>>
>>>>certain I
>>>>   
>>>>
>>>>        
>>>>
>>>>>understood you fully.
>>>>>
>>>>>Otis
>>>>>
>>>>>--- Albert Vila <avp@imente.com> wrote:
>>>>>
>>>>>
>>>>>     
>>>>>
>>>>>          
>>>>>
>>>>>>Hi all,
>>>>>>
>>>>>>I was wondering If I can search using the MultiSearcher over two 
>>>>>>diferent indexes at the same time (with diferent fields).
>>>>>>I've got one big index, with the code, title, content, language,
>>>>>>       
>>>>>>
>>>>>>            
>>>>>>
>>>>etc 
>>>>   
>>>>
>>>>        
>>>>
>>>>>>fields (new documents are added incrementally). Now, I have to
>>>>>>introduce 
>>>>>>a clustering field. The problem is that I have to update the
>>>>>>            
>>>>>>
>>whole
>>    
>>
>>>>>>index 
>>>>>>each time the clusters change, and I have no enought time to do
>>>>>>            
>>>>>>
>>it
>>    
>>
>>>>>>       
>>>>>>
>>>>>>            
>>>>>>
>>>>(I
>>>>   
>>>>
>>>>        
>>>>
>>>>>>wanna check for new clusters every 10 minuts and I spent 25
>>>>>>            
>>>>>>
>>minutes
>>    
>>
>>>>>>to 
>>>>>>reindex the whole index).
>>>>>>A query example could be: language:0 and title:java and cluster:0
>>>>>>
>>>>>>Can I leave the big index whitout any changes and create a new
>>>>>>       
>>>>>>
>>>>>>            
>>>>>>
>>>>index 
>>>>   
>>>>
>>>>        
>>>>
>>>>>>with only the following fields, code and cluster, and perform the
>>>>>>            
>>>>>>
>>>>>>searches using this two indexes? I think I cannot do that without
>>>>>>            
>>>>>>
>>>>>>changing the code. It would need a postprocess, matching all
>>>>>>returning 
>>>>>>codes from index 1 with index 2.
>>>>>>
>>>>>>Anyone have a solution for this problem? I would appreciate that.
>>>>>>  
>>>>>>
>>>>>>       
>>>>>>
>>>>>>            
>>>>>>
>>>>>
>>>>>     
>>>>>
>>>>>          
>>>>>
>>>>-- 
>>>>Albert Vila
>>>>Director de proyectos I+D
>>>>http://www.imente.com
>>>>902 933 242
>>>>[iMente “La información con más beneficios”]
>>>>
>>>>
>>>>        
>>>>
>>>---------------------------------------------------------------------
>>>      
>>>
>>>>To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>>>>For additional commands, e-mail:
>>>>        
>>>>
>>lucene-user-help@jakarta.apache.org
>>    
>>
>>>>   
>>>>
>>>>        
>>>>
>>>      
>>>
>>---------------------------------------------------------------------
>>    
>>
>>>To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>>>For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>>>
>>>
>>>
>>> 
>>>
>>>      
>>>
>>-- 
>>Albert Vila
>>Director de proyectos I+D
>>http://www.imente.com
>>902 933 242
>>[iMente “La información con más beneficios”]
>>
>>
>>---------------------------------------------------------------------
>>To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>>For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>>
>>
>>    
>>
>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>
>
>
>  
>

-- 
Albert Vila
Director de proyectos I+D
http://www.imente.com
902 933 242
[iMente “La información con más beneficios”]


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message