lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gerardo Segura <gseg...@correobancomer.com>
Subject Re: Frequently updated fields
Date Sun, 14 Sep 2008 08:06:32 GMT
I had similar requirements: some fields didn't required text processing, 
there were just used as filters to focus the search on subset of 
documents in solr. As Karl suggested, implementing a filter was the most 
direct approach for me.

The issue was that, not been familiar myself with solr, I couldn't 
manage to integrate my filter without modifying SolrIndexSearcher,  the 
change was basically to replace every invocation of

          searcher.search(query, new HitCollector() { ... }) ;
with
          searcher.search(query, myCustomFilter, new HitCollector() { 
... }) ;

myCustomFilter is an instance of TermsFilter with document's keys added 
based on a query from external database.  Also minor changes were made 
in SolrCore.java to be able to declare the filter in sorlconfig.xml.
The thing worked ok, but I always wondered if that was the best way to 
integrate the filter.

regards,

Wojciech Strzałka wrote:
> The most changing fields will be I think:
>   Status (read/unread):  in fact I'm affraid of this at most - any
>                          mail incoming to the system will need to be indexed at least
twice
>   Flags:   0..n values from enum
>   Tags:    0..n values from enum
>
> Of course all the other fields can also change - even content in draft messages
> (it's live content, not archival) - but in such a case I'm ready to go
> with the re-indexing.
>   
>> Hi Wojciech,
>>     
>> can you please give us a bit more specific information about the meta
>> data fields that will change? I would recommend you looking at  
>> creating filters from your primary persistency for query clauses such
>> as unread/read, mailbox folders, et c.
>>     
>>        karl
>>     
>> 12 sep 2008 kl. 13.57 skrev Wojciech Strza?ka:
>>     
>>> Hi.
>>>
>>>   I'm new to Lucene and I would like to get a few answers (they can
>>>   be lame)
>>>
>>>   I want to index large amount of emails using Lucene (maybe SOLR),  
>>> not only
>>>   the contents but also some metadata like state or flags. The
>>>   problem is that the metadata will change during mail lifecycle,
>>>   although much smaller updating this information will require
>>>   reindex the whole mail content which I see performance bottleneck.
>>>
>>>   I have the data in DB also so my first question is:
>>>
>>>   - are there any best practices to implement my needs (querying both
>>>   lucene & DB and then merging in memory?, close one eye and re-index
>>>   the whole content on every metadata change? others?)
>>>
>>>   - is at all Lucene good solution for my problem?
>>>
>>>   - are there any plans to implement field updates in more efficient  
>>> way then
>>>   delete/insert the whole document? if yes what's the time horizon?
>>>
>>>
>>>                                        Best regards
>>>                                               Wojtek
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>>       
>
>
>   
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>     
>
>
>
>   


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message