lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alex Winston <a...@christianity.com>
Subject Re: Searching Ranges
Date Mon, 11 Nov 2002 16:57:17 GMT
good thoughts and something that i would like to explore further. let me
create a more concrete example that we can use to help visualize the
problem and a possible solution and then get feedback.  it may be that
the changes work in my case but not for all cases, or that there is
something else that could help alleviate the overhead i am experiencing.

lets say that i have a document named "d1", which contains a field named
"references".  within the "references" field i maintain a list of terms
that represent my range from 001-005, more specifically the field would
contain the terms "001 002 003 004 005".

i would now like to search this range to determine if it falls within
the range 003-010, so my query would look like "references:[003 010]".

in this case RangeQuery begins to iterate the terms contained within
"references" to determine if "d1" is a match.  as it iterates each term
is analyzed, 001 and 002 fail so we continue to iterate at which point
003 is determined to be a match.

at this point there is no need to continue to search the terms because
we have determined there is a match, not only that but it helps reduce
the size of the byte[] cache that is created i believe.  like i
mentioned earlier, i am not sure of the ramifications this may have on
scoring, but it works for most cases i can think of, but if i am missing
something that is what i need feedback on :).

you can imagine how this improves the avg efficiency in my case if i
have 10000 terms in "references".  although i may be doing something
that was either not intended or ill-designed.

thanks, any thoughts?
alex



On Mon, 2002-11-11 at 10:50, Scott Ganyo wrote:
> Hi Alex,
> 
> I just looked at this and had the following thought:
> 
> The RangeQuery must continue to iterate after the first match is found 
> in order to match everything within the specified range.  In other 
> words, if you have a range of "a" to "d", you can't stop with "a", you 
> need to continue to "d".  At the point you move beyond "d" is the point 
> where the query should stop iterating.  That is reflected in lines 
> 160-162.  It seems to me that your solution would only work where your 
> range consists of a single term.
> 
> Please let me know if I'm just misunderstanding the situation.
> 
> Scott
> 
> Alex Winston wrote:
> 
> > thanks for the reply, my apologizes for not explaining myself very
> > clearly, it has been a long day.
> >
> > you expressed exactly our situation, unfortunately this is not an option
> > because we want to have multiple ranges for each document as well,
> > there is a possible extension of what you suggested but that is a last
> > resort.  kinda crazy i know, but you have to meet requirements :).
> >
> > but i also had a thought while i was looking through the lucene code,
> > and any comments are welcome.
> >
> > i may be very mistaken because it has been a long day but if you look at
> > the current cvs version of RangeQuery it appears that even if a match is
> > found it will continue to iterate over terms within a field, and in my
> > case it is on the order of thousands.  if i add a break after a match
> > has been found it appears as though the search is improved on avg an
> > order of magnitude, my math has left me so i cannot be theoretical at
> > the moment.  i have unit tested the change on my side and on the lucene
> > side and it works.  note: one hard example is that a query went from 20
> > seconds to .5 seconds.  any initial thoughts to if there is a case where
> > this would not work?
> >
> > beginning line 164:
> > TermQuery tq = new TermQuery(term);	  // found a match
> > tq.setBoost(boost);			   // set the boost
> > q.add(tq, false, false);		  // add to q
> > break;  // ADDED!
> >
> >
> > On Fri, 2002-11-08 at 15:09, Mike Barry wrote:
> >
> > >Alex,
> > >
> > >It is rather confusing. It sounds like you've indexed
> > >a field that that can be between two values (let's say
> > >E-J) and then when you have a search term such as G
> > >you want the docs containing E-J (or A-H or F-K but not A-H
> > >nor A-C nor J-Z)
> > >
> > >Just of the top of my head but could you index the upper and
> > >lower bounds as separate fields then when you search do a
> > >compound query:
> > >
> > >     lower_bound:{ - search_term } AND upper_bound:{ search_term - }
> > >
> > >just a thought.
> > >
> > >>-MikeB.
> > >
> > >
> > >Alex Winston wrote:
> > >
> > >
> > >>i was hoping that someone could briefly review my current solution to a
> > >>problem that we have encountered to see if anyone could suggest a
> > >>possible alternative, because as it stands we have pushed lucene past
> > >>its current limits.
> > >>
> > >>PROBLEM:
> > >>
> > >>we were wanting to represent a range of values for a particular field
> > >>that is searchable over a particular range.
> > >>
> > >>an example follows for clarification:
> > >>we were wanting to store a range of chapters and verses of a book for a
> > >>particular document, and in turn search to see if a query range includes
> > >>the range that is represented in the index.
> > >>
> > >>if this is unclear please ask for clarification
> > >>
> > >>IMPRACTICAL SOLUTION:
> > >>
> > >>although this solution seems somewhat impractical it is all we could
> > >>come up with.
> > >>
> > >>our solution involved storing each possible range value within the term
> > >>which would allow for RangeQuerys to be performed on this particular
> > >>field.  for very small ranges this seems somewhat practical after
> > >>profiling.  although once the field ranges began to span multiple
> > >>chapters and verses, the search times became unreasonable because we
> > >>were storing thousands of entries for each representative range.
> > >>
> > >>i can elaborate on anything that is unclear,
> > >>but any thoughts on a possible alternative solution within lucene that
> > >>we overlooked would be extremely helpful.
> > >>	
> > >>
> > >>alex
> > >
> > >
> > >
> > >--
> > >To unsubscribe, e-mail:
> > >For additional commands, e-mail:
> > >
> > >
> >
> 
> -- 
> Brain: Pinky, are you pondering what I’m pondering?
> Pinky: I think so, Brain, but calling it a pu-pu platter? Huh, what were 
> they thinking?
> 
> 
> --
> To unsubscribe, e-mail:   <mailto:lucene-user-unsubscribe@jakarta.apache.org>
> For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>
> 
> 


Mime
View raw message