Return-Path: Delivered-To: apmail-jakarta-lucene-user-archive@apache.org Received: (qmail 31734 invoked from network); 8 Nov 2002 21:39:48 -0000 Received: from unknown (HELO nagoya.betaversion.org) (192.18.49.131) by daedalus.apache.org with SMTP; 8 Nov 2002 21:39:48 -0000 Received: (qmail 23168 invoked by uid 97); 8 Nov 2002 21:40:40 -0000 Delivered-To: qmlist-jakarta-archive-lucene-user@jakarta.apache.org Received: (qmail 23152 invoked by uid 97); 8 Nov 2002 21:40:40 -0000 Mailing-List: contact lucene-user-help@jakarta.apache.org; run by ezmlm Precedence: bulk List-Unsubscribe: List-Subscribe: List-Help: List-Post: List-Id: "Lucene Users List" Reply-To: "Lucene Users List" Delivered-To: mailing list lucene-user@jakarta.apache.org Received: (qmail 23140 invoked by uid 98); 8 Nov 2002 21:40:39 -0000 X-Antivirus: nagoya (v4218 created Aug 14 2002) Subject: Re: Searching Ranges From: Alex Winston To: Lucene Users List In-Reply-To: <1036788892.1395.70.camel@alex.christianity.com> References: <1036783420.8073.32.camel@alex.christianity.com> <3DCC19E8.50907@cos.com> <1036788892.1395.70.camel@alex.christianity.com> Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="=-GjsveSrABDOviP5xDCFL" X-Mailer: Ximian Evolution 1.0.8 Date: 08 Nov 2002 16:39:25 -0500 Message-Id: <1036791565.1395.77.camel@alex.christianity.com> Mime-Version: 1.0 X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N --=-GjsveSrABDOviP5xDCFL Content-Type: text/plain Content-Transfer-Encoding: quoted-printable apologizes for replying to myself, but another nice side-effect of this fix is that it virtually eliminates the potential for an OutOfMemoryError, which was a problem i encountered on extremely large fields, over 10000 terms, while i was profiling the RangeQuery class. i can get into specifics if need be, any thoughts? alex On Fri, 2002-11-08 at 15:54, Alex Winston wrote: > thanks for the reply, my apologizes for not explaining myself very > clearly, it has been a long day. >=20 > you expressed exactly our situation, unfortunately this is not an option > because we want to have multiple ranges for each document as well,=20 > there is a possible extension of what you suggested but that is a last > resort. kinda crazy i know, but you have to meet requirements :). >=20 > but i also had a thought while i was looking through the lucene code, > and any comments are welcome. =20 >=20 > i may be very mistaken because it has been a long day but if you look at > the current cvs version of RangeQuery it appears that even if a match is > found it will continue to iterate over terms within a field, and in my > case it is on the order of thousands. if i add a break after a match > has been found it appears as though the search is improved on avg an > order of magnitude, my math has left me so i cannot be theoretical at > the moment. i have unit tested the change on my side and on the lucene > side and it works. note: one hard example is that a query went from 20 > seconds to .5 seconds. any initial thoughts to if there is a case where > this would not work? >=20 > beginning line 164: > TermQuery tq =3D new TermQuery(term); // found a match > tq.setBoost(boost); // set the boost > q.add(tq, false, false); // add to q > break; // ADDED! >=20 >=20 > On Fri, 2002-11-08 at 15:09, Mike Barry wrote: > > Alex, > >=20 > > It is rather confusing. It sounds like you've indexed > > a field that that can be between two values (let's say > > E-J) and then when you have a search term such as G > > you want the docs containing E-J (or A-H or F-K but not A-H > > nor A-C nor J-Z) > >=20 > > Just of the top of my head but could you index the upper and > > lower bounds as separate fields then when you search do a > > compound query: > >=20 > > lower_bound:{ - search_term } AND upper_bound:{ search_term - } > >=20 > > just a thought. > > > -MikeB. > >=20 > >=20 > > Alex Winston wrote: > >=20 > > > i was hoping that someone could briefly review my current solution to= a > > > problem that we have encountered to see if anyone could suggest a > > > possible alternative, because as it stands we have pushed lucene past > > > its current limits. > > > > > > PROBLEM: > > > > > > we were wanting to represent a range of values for a particular field > > > that is searchable over a particular range. > > > > > > an example follows for clarification: > > > we were wanting to store a range of chapters and verses of a book for= a > > > particular document, and in turn search to see if a query range inclu= des > > > the range that is represented in the index. > > > > > > if this is unclear please ask for clarification > > > > > > IMPRACTICAL SOLUTION: > > > > > > although this solution seems somewhat impractical it is all we could > > > come up with. > > > > > > our solution involved storing each possible range value within the te= rm > > > which would allow for RangeQuerys to be performed on this particular > > > field. for very small ranges this seems somewhat practical after > > > profiling. although once the field ranges began to span multiple > > > chapters and verses, the search times became unreasonable because we > > > were storing thousands of entries for each representative range. > > > > > > i can elaborate on anything that is unclear, > > > but any thoughts on a possible alternative solution within lucene tha= t > > > we overlooked would be extremely helpful. > > > =09 > > > > > > alex > >=20 > >=20 > >=20 > > -- > > To unsubscribe, e-mail: > > For additional commands, e-mail: > >=20 > >=20 >=20 --=-GjsveSrABDOviP5xDCFL Content-Type: application/pgp-signature; name=signature.asc Content-Description: This is a digitally signed message part -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.0.6 (GNU/Linux) Comment: For info see http://www.gnupg.org iD8DBQA9zC8NHPZgaqm4PiwRAk1gAJ4qIQndUn5MZFXQjBnQSLI/xxcQdQCggUXk 06cwLt8ZL528xQAdBnQdM30= =tvp4 -----END PGP SIGNATURE----- --=-GjsveSrABDOviP5xDCFL--