lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Otis Gospodnetic <otis_gospodne...@yahoo.com>
Subject Re: Searching Ranges
Date Mon, 11 Nov 2002 17:23:46 GMT
I have never used the RangeQuery, but my understanding was that it is
for cases like this:

Assume you have documents with a field called 'percentage' that can
have a value between 0 and 100.
Assume your index has 3 documents, one with percentage=10, one with
percentage=60, and one with percentage=90.

You could then to RangeQuery: percentage:[50 100]

This will match 2 documents, one with percentage=60, and one with
percentage=90.


I think that is slightly different from queries where you want to see
whether a field with value of "001 002 003 004 005" has at least 1 term
that falls in the range specified in the query string.

I haven't tested what I said here, so I could be wrong.

Otis




--- Alex Winston <alex@christianity.com> wrote:
> good thoughts and something that i would like to explore further. let
> me
> create a more concrete example that we can use to help visualize the
> problem and a possible solution and then get feedback.  it may be
> that
> the changes work in my case but not for all cases, or that there is
> something else that could help alleviate the overhead i am
> experiencing.
> 
> lets say that i have a document named "d1", which contains a field
> named
> "references".  within the "references" field i maintain a list of
> terms
> that represent my range from 001-005, more specifically the field
> would
> contain the terms "001 002 003 004 005".
> 
> i would now like to search this range to determine if it falls within
> the range 003-010, so my query would look like "references:[003
> 010]".
> 
> in this case RangeQuery begins to iterate the terms contained within
> "references" to determine if "d1" is a match.  as it iterates each
> term
> is analyzed, 001 and 002 fail so we continue to iterate at which
> point
> 003 is determined to be a match.
> 
> at this point there is no need to continue to search the terms
> because
> we have determined there is a match, not only that but it helps
> reduce
> the size of the byte[] cache that is created i believe.  like i
> mentioned earlier, i am not sure of the ramifications this may have
> on
> scoring, but it works for most cases i can think of, but if i am
> missing
> something that is what i need feedback on :).
> 
> you can imagine how this improves the avg efficiency in my case if i
> have 10000 terms in "references".  although i may be doing something
> that was either not intended or ill-designed.
> 
> thanks, any thoughts?
> alex
> 
> 
> 
> On Mon, 2002-11-11 at 10:50, Scott Ganyo wrote:
> > Hi Alex,
> > 
> > I just looked at this and had the following thought:
> > 
> > The RangeQuery must continue to iterate after the first match is
> found 
> > in order to match everything within the specified range.  In other 
> > words, if you have a range of "a" to "d", you can't stop with "a",
> you 
> > need to continue to "d".  At the point you move beyond "d" is the
> point 
> > where the query should stop iterating.  That is reflected in lines 
> > 160-162.  It seems to me that your solution would only work where
> your 
> > range consists of a single term.
> > 
> > Please let me know if I'm just misunderstanding the situation.
> > 
> > Scott
> > 
> > Alex Winston wrote:
> > 
> > > thanks for the reply, my apologizes for not explaining myself
> very
> > > clearly, it has been a long day.
> > >
> > > you expressed exactly our situation, unfortunately this is not an
> option
> > > because we want to have multiple ranges for each document as
> well,
> > > there is a possible extension of what you suggested but that is a
> last
> > > resort.  kinda crazy i know, but you have to meet requirements
> :).
> > >
> > > but i also had a thought while i was looking through the lucene
> code,
> > > and any comments are welcome.
> > >
> > > i may be very mistaken because it has been a long day but if you
> look at
> > > the current cvs version of RangeQuery it appears that even if a
> match is
> > > found it will continue to iterate over terms within a field, and
> in my
> > > case it is on the order of thousands.  if i add a break after a
> match
> > > has been found it appears as though the search is improved on avg
> an
> > > order of magnitude, my math has left me so i cannot be
> theoretical at
> > > the moment.  i have unit tested the change on my side and on the
> lucene
> > > side and it works.  note: one hard example is that a query went
> from 20
> > > seconds to .5 seconds.  any initial thoughts to if there is a
> case where
> > > this would not work?
> > >
> > > beginning line 164:
> > > TermQuery tq = new TermQuery(term);	  // found a match
> > > tq.setBoost(boost);			   // set the boost
> > > q.add(tq, false, false);		  // add to q
> > > break;  // ADDED!
> > >
> > >
> > > On Fri, 2002-11-08 at 15:09, Mike Barry wrote:
> > >
> > > >Alex,
> > > >
> > > >It is rather confusing. It sounds like you've indexed
> > > >a field that that can be between two values (let's say
> > > >E-J) and then when you have a search term such as G
> > > >you want the docs containing E-J (or A-H or F-K but not A-H
> > > >nor A-C nor J-Z)
> > > >
> > > >Just of the top of my head but could you index the upper and
> > > >lower bounds as separate fields then when you search do a
> > > >compound query:
> > > >
> > > >     lower_bound:{ - search_term } AND upper_bound:{ search_term
> - }
> > > >
> > > >just a thought.
> > > >
> > > >>-MikeB.
> > > >
> > > >
> > > >Alex Winston wrote:
> > > >
> > > >
> > > >>i was hoping that someone could briefly review my current
> solution to a
> > > >>problem that we have encountered to see if anyone could suggest
> a
> > > >>possible alternative, because as it stands we have pushed
> lucene past
> > > >>its current limits.
> > > >>
> > > >>PROBLEM:
> > > >>
> > > >>we were wanting to represent a range of values for a particular
> field
> > > >>that is searchable over a particular range.
> > > >>
> > > >>an example follows for clarification:
> > > >>we were wanting to store a range of chapters and verses of a
> book for a
> > > >>particular document, and in turn search to see if a query range
> includes
> > > >>the range that is represented in the index.
> > > >>
> > > >>if this is unclear please ask for clarification
> > > >>
> > > >>IMPRACTICAL SOLUTION:
> > > >>
> > > >>although this solution seems somewhat impractical it is all we
> could
> > > >>come up with.
> > > >>
> > > >>our solution involved storing each possible range value within
> the term
> > > >>which would allow for RangeQuerys to be performed on this
> particular
> > > >>field.  for very small ranges this seems somewhat practical
> after
> > > >>profiling.  although once the field ranges began to span
> multiple
> > > >>chapters and verses, the search times became unreasonable
> because we
> > > >>were storing thousands of entries for each representative
> range.
> > > >>
> > > >>i can elaborate on anything that is unclear,
> > > >>but any thoughts on a possible alternative solution within
> lucene that
> > > >>we overlooked would be extremely helpful.
> > > >>	
> > > >>
> > > >>alex
> > > >
> > > >
> > > >
> > > >--
> > > >To unsubscribe, e-mail:
> > > >For additional commands, e-mail:
> > > >
> > > >
> > >
> > 
> > -- 
> > Brain: Pinky, are you pondering what Iím pondering?
> > Pinky: I think so, Brain, but calling it a pu-pu platter? Huh, what
> were 
> > they thinking?
> > 
> > 
> > --
> > To unsubscribe, e-mail:  
> <mailto:lucene-user-unsubscribe@jakarta.apache.org>
> > For additional commands, e-mail:
> <mailto:lucene-user-help@jakarta.apache.org>
> > 
> > 
> 
> 

> ATTACHMENT part 2 application/pgp-signature name=signature.asc



__________________________________________________
Do you Yahoo!?
U2 on LAUNCH - Exclusive greatest hits videos
http://launch.yahoo.com/u2

--
To unsubscribe, e-mail:   <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>


Mime
View raw message