lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From mark harwood <markharw...@yahoo.co.uk>
Subject Re: Multiple time ranges in a document
Date Mon, 19 Feb 2007 13:36:03 GMT
The problem arises because there are multiple ranges defined in the document and it is not
easy to test the start/end value pairs when they are held as independent values in separate
fields. AFAIK there is currently no query implementation for testing position relationships
in words from more than one field.
There is one sneaky way however, to record and query range information in the one field -
you can (ab)use the position information stored by Lucene to encode data such as time and
then query for ranges using SpanQueries.

By writing a custom analyzer you can artificially "place" words in a required location in
a field by setting the position increment value appropriately.
If you chose to think of the extent of the field as representing time e.g. the hours or minutes
in a day/week you can post pairs of "start" and "end" words at an appropriate location to
effectively record the range of time or date information.

This solves the problem of recording pairs of ranges that can be queried.

Now for your query - your example query is actually a collection of ranges rather than one
single range (just Thursdays) so this would need to be expressed as multiple SpanQueries testing
the areas of the document which represent the time period of interest to see if they contain
a start and end pair. 

Worth a try?

Mark

----- Original Message ----
From: Vijay Santhanam <vijay@spectrumwired.com>
To: java-user@lucene.apache.org
Sent: Sunday, 18 February, 2007 11:43:39 PM
Subject: Multiple time ranges in a document

Hello,

 

I'm using a RangeFilter to find "Event" documents (with Start and End lucene
friendly formatted date fields) that match a Users time range query. This
works perfectly in sub-second times at decent loads, but I'm having trouble
searching multiple performances in the one document. Indexing them is no
problem, because I can add extra terms to the start and end fields.

 

Here's a situation that doesn't work to well with the RangeFilter:-

 

Let's say a comedian has a regular gig every Monday for the next 3 weeks,
from 7pm-9pm. So, the start field will be 200702191900, 200702261900,
200703051900. And, the end field will be 200702192100, 200702262100,
200703052100.

If someone searches for an event on Thursday anytime during his 3 week
stint, the comedian's event will show up, because the Range Filter will
consider the lowest term of the start field and the highest term of the end
field.

 

Also, sorting by start or end fields will break, but I could write my own
SortComparatorSource to fix that.

 

How could I get around the filter problem? I could write my own filter, but
it would need to keep track of both fields, and their respective term
positions for each field.

 

Thanks for your help,

-Vijay

 






	
	
		
___________________________________________________________ 
New Yahoo! Mail is the ultimate force in competitive emailing. Find out more at the Yahoo!
Mail Championships. Plus: play games and win prizes. 
http://uk.rd.yahoo.com/evt=44106/*http://mail.yahoo.net/uk 

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message