lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gerardo Segura <>
Subject Re: Frequently updated fields
Date Sun, 14 Sep 2008 08:06:32 GMT
I had similar requirements: some fields didn't required text processing, 
there were just used as filters to focus the search on subset of 
documents in solr. As Karl suggested, implementing a filter was the most 
direct approach for me.

The issue was that, not been familiar myself with solr, I couldn't 
manage to integrate my filter without modifying SolrIndexSearcher,  the 
change was basically to replace every invocation of

, new HitCollector() { ... }) ;
, myCustomFilter, new HitCollector() { 
... }) ;

myCustomFilter is an instance of TermsFilter with document's keys added 
based on a query from external database.  Also minor changes were made 
in to be able to declare the filter in sorlconfig.xml.
The thing worked ok, but I always wondered if that was the best way to 
integrate the filter.


Wojciech Strzałka wrote:
> The most changing fields will be I think:
>   Status (read/unread):  in fact I'm affraid of this at most - any
>                          mail incoming to the system will need to be indexed at least
>   Flags:   0..n values from enum
>   Tags:    0..n values from enum
> Of course all the other fields can also change - even content in draft messages
> (it's live content, not archival) - but in such a case I'm ready to go
> with the re-indexing.
>> Hi Wojciech,
>> can you please give us a bit more specific information about the meta
>> data fields that will change? I would recommend you looking at  
>> creating filters from your primary persistency for query clauses such
>> as unread/read, mailbox folders, et c.
>>        karl
>> 12 sep 2008 kl. 13.57 skrev Wojciech Strza?ka:
>>> Hi.
>>>   I'm new to Lucene and I would like to get a few answers (they can
>>>   be lame)
>>>   I want to index large amount of emails using Lucene (maybe SOLR),  
>>> not only
>>>   the contents but also some metadata like state or flags. The
>>>   problem is that the metadata will change during mail lifecycle,
>>>   although much smaller updating this information will require
>>>   reindex the whole mail content which I see performance bottleneck.
>>>   I have the data in DB also so my first question is:
>>>   - are there any best practices to implement my needs (querying both
>>>   lucene & DB and then merging in memory?, close one eye and re-index
>>>   the whole content on every metadata change? others?)
>>>   - is at all Lucene good solution for my problem?
>>>   - are there any plans to implement field updates in more efficient  
>>> way then
>>>   delete/insert the whole document? if yes what's the time horizon?
>>>                                        Best regards
>>>                                               Wojtek
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail:
>>> For additional commands, e-mail:
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail:
>> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message