lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erik Hatcher <>
Subject Re: Design question [too many fields?]
Date Fri, 01 Jul 2005 09:39:04 GMT

On Jun 30, 2005, at 11:27 PM, Chris Lu wrote:
> Mark, your suggestion will incur another trip to the database. And  
> if the search results is large, filtering in DB by pk is not really  
> good.

Chris - I disagree with that last comment.  It can be a great  
solution when the filter is cached.  Certainly building a filter for  
every search would be inefficient, but filters are really best when  

> Erik, your original "date" field is good when there is not many  
> dates(<1024) in the database. Otherwise, Range Query can not handle  
> it.

Not quite correct... it would not matter how many dates were in the  
index/database, only how many were within the range used by  
RangeQuery.  The original requirement was a years worth of days,  
which at most would be 366 days and I suspect someone looking for  
hotel room availability would be narrowing things down to a week or  

> My suggestion is, use "year" + "month" + "day" three fields to  
> store date. And when searching, for example, any date that's  
> greater than 2005-06-30, you can use this query to search: ( year >  
> 2005 ) or  ( year=2005 and month>=6) or ( year=2005 and month=6 and  
> day > 30 ).
> It's a combination of BooleanQuery, TermQuery, and RangeQuery.

How would you represent multiple dates for a Document using that  
scheme?  Wasn't that one of the original requirements?

> This may seem cumbersome, but it can save one trip to database, and  
> circumvent Lucene's limitation.

One trip to the DB *once* with the results cached is mighty  
inexpensive in the grand scheme of things.  Mark's point is something  
I agree with (and wrote about in the custom filter example in Lucene  
in Action) - some information makes good sense to stay in a  
relational database when its too volatile to put in a Lucene index.  
Building a filter to access a DB, with the results cached is a good  
solution to the specified problem, I think.  Certainly there are many  
ways to solve it though.


> Chris Lu
> Erik Hatcher wrote:
>> I second Mark's suggestion over the alternative I posted.  My   
>> alternative was merely to invert the field structure originally   
>> described, but using a Filter for the volatile information is wiser.
>>     Erik
>> On Jun 29, 2005, at 9:58 AM, mark harwood wrote:
>>> Presumably there is also a free-text element to the
>>> search or you wouldn't be using Lucene.
>>> Multiple fields is not the way to go.
>>> A single Lucene field could contain multiple terms (
>>> the available dates) but I still don't think that's
>>> the best solution.
>>> The availability info is likely to be pretty volatile
>>> and you always want up-to-date info so I would prefer
>>> to hit a database for this. If you keep a DB primary
>>> key to Lucene doc id look-up cached in memory you can
>>> quickly construct a Lucene filter from the database
>>> results and therefore only show Lucene results for
>>> available rooms.
>>> Cheers
>>> Mark
>>> ___________________________________________________________
>>> How much free photo storage do you get? Store your holiday
>>> snaps for FREE with Yahoo! Photos
>>> -------------------------------------------------------------------- 
>>> -
>>> To unsubscribe, e-mail:
>>> For additional commands, e-mail:
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail:
>> For additional commands, e-mail:
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message