Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 63011 invoked from network); 15 Sep 2008 02:52:37 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 15 Sep 2008 02:52:37 -0000 Received: (qmail 46316 invoked by uid 500); 15 Sep 2008 02:52:26 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 46278 invoked by uid 500); 15 Sep 2008 02:52:26 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 46267 invoked by uid 99); 15 Sep 2008 02:52:26 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 14 Sep 2008 19:52:26 -0700 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [69.55.225.129] (HELO ehatchersolutions.com) (69.55.225.129) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 15 Sep 2008 02:51:27 +0000 Received: by ehatchersolutions.com (Postfix, from userid 504) id AB78530EFC16; Sun, 14 Sep 2008 20:51:28 -0600 (MDT) X-Spam-Checker-Version: SpamAssassin 3.1.1 (2006-03-10) on javelina X-Spam-Level: Received: from [10.0.1.2] (va-69-68-189-191.dyn.embarqhsd.net [69.68.189.191]) by ehatchersolutions.com (Postfix) with ESMTP id 5A26E30EFC16 for ; Sun, 14 Sep 2008 20:51:25 -0600 (MDT) Message-Id: From: Erik Hatcher To: java-user@lucene.apache.org In-Reply-To: <49942.1221443570@ontrenet.com> Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes Content-Transfer-Encoding: 7bit Mime-Version: 1.0 (Apple Message framework v928.1) Subject: Re: SpanQuery problem Date: Sun, 14 Sep 2008 22:51:23 -0400 References: <49942.1221443570@ontrenet.com> X-Mailer: Apple Mail (2.928.1) X-Virus-Checked: Checked by ClamAV on apache.org X-Old-Spam-Status: No, score=-0.6 required=5.0 tests=BAYES_00,RCVD_IN_SORBS_DUL autolearn=no version=3.1.1 SpanFirstQuery - is that what you're looking for? A SpanOrQuery of an expanded set of SpanFirstQuery's for all Terms that start with "car" (via a TermEnum). However, in this specialized case where you know exactly what you want to index upon (the first word of a string), then one nice solution to consider is indexing that first word as its very own field. Document 1 word: blue bird word: blue card first_word: blue (term frequency = 2) Document 2 word: sky blue word: sea blue first_word: sea first_word: sky The tee token filter stuff would be a clever way to implement it via a single tokenization pass, though not the only way. Erik On Sep 14, 2008, at 9:52 PM, wrote: > Yes, return the document, but the problem is with SpanNearQuery it > does not return the documents I expect. > Sorry I did not explain it well. Consider 2 documents each with > "word" fields. > > Document 1 > > word: blue bird > word: blue car > > Document 2 > > word: sky blue > word: sea blue > > I want to search for 'blue' and ONLY return Document 1 as I already > know > that the term 'blue' MUST appear at the front of the field word: > > SpanNearQuery with slop of 0 or 1 won't do this if Document 1 has > other fields > like this. > > Document 1 - IS NOT FOUND WITH SPAN NEAR 0 or 1 > > word: some blue > word: another blue > word: blue bird > word: blue car > > Expanding the Span slop to 3 will find Document 1 above this line, > however > I thought the slop meant within the field terms. It seems to refer > to the list of fields rather than terms. This is unexpected behavior > to me. But I'm no lucene expert. > > Thanks for any thoughts. > > Darren > > > darren@ontrenet.com wrote: >> Thanks Paul. I will study your response more, as I don't fully >> understand it yet - specifically "You'll need to expand the prefix >> into indexed terms". >> >> But what I want to do is so simple I'm surprised it cannot be done. >> >> You are saying that I cannot find all fields across all documents >> that begin with a string or space bounded word? Consider 1 document >> with: >> >> word: blue car >> word: red car >> word: car door >> word: car wheel >> >> Using whitespace analyzer I simply want to query all fields in all >> documents >> where 'car' is the at the very front of the field. >> >> word: car door >> word: car wheel >> >> This cannot be done? I don't want to retrieve all of them and prune >> the results myself because it will consume lots of resources. >> >> thanks so much! >> >> Darren >> >> On Sun Sep 14 16:36 , Paul Elschot sent:Op Sunday 14 September >> 2008 19:36:38 schreef Darren Govoni: >> >>> Hi, >>> I am seeing odd behavior with SpanNearQuery. >>> >>> The problem is that with multiple fields, all fields beyond the >>> first >>> one 'car' are not seen by the span. I didn't think the span meant to >>> sets of the same field, but rather to terms within a given field. >>> >>> Document 1. 1 field (word) >>> >>> word: car >>> word: cars >>> word: cars wash >>> word: cars lot >>> >>> >>> SpanNearyQuery with slop of 0. Wrapped by SpanFirstQuery with slop >>> of >>> 1. Term query within is "word","cars*". No results found. >>> >> >> There is no SpanPrefixQuery for cars* in Lucene. You'll need to >> expand the prefix into indexed terms to create a SpanOrQuery >> yourself. This is fairly straightforward from PrefixQuery and >> SpanOrQuery. >> Alternatively, have a look at the surround query parser in contrib >> for a working example. >> >> Regards, >> Paul Elschot >> >> >>> If I remove the first field word: car, it works. Also, if I increase >>> the slop, it will return results from only the first amount of >>> fields >>> in the slop rather than terms within the field value. >>> >>> Is what I am seeing the correct behavior? Doesn't seem like it. >>> >>> What I am trying to do is span _within_ EACH field and match phrases >>> that begin with "cars*". Shouldn't be too hard to do I thought. >>> >>> Darren >>> >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org >>> For additional commands, e-mail: java-user-help@lucene.apache.org >>> >> >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org >> For additional commands, e-mail: java-user-help@lucene.apache.org >> >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org >> For additional commands, e-mail: java-user-help@lucene.apache.org >> >> > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-user-help@lucene.apache.org > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-user-help@lucene.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org