Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm
Precedence: bulk
Reply-To: java-user@lucene.apache.org
Received-SPF: pass (athena.apache.org: local policy)
Message-Id: <DEAD1BBE-08D2-4E16-A0BE-ADBBD33283F8@ehatchersolutions.com>
From: Erik Hatcher <erik@ehatchersolutions.com>
To: java-user@lucene.apache.org
In-Reply-To: <49942.1221443570@ontrenet.com>
Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes
Content-Transfer-Encoding: 7bit
Mime-Version: 1.0 (Apple Message framework v928.1)
Subject: Re: SpanQuery problem
Date: Sun, 14 Sep 2008 22:51:23 -0400
References: <49942.1221443570@ontrenet.com>

SpanFirstQuery - is that what you're looking for?  A SpanOrQuery of an  
expanded set of SpanFirstQuery's for all Terms that start with  
"car" (via a TermEnum).

However, in this specialized case where you know exactly what you want  
to index upon (the first word of a string), then one nice solution to  
consider is indexing that first word as its very own field.

   Document 1
     word: blue bird
     word: blue card
     first_word: blue (term frequency = 2)

   Document 2
     word: sky blue
     word: sea blue
     first_word: sea
     first_word: sky

The tee token filter stuff would be a clever way to implement it via a  
single tokenization pass, though not the only way.

	Erik

On Sep 14, 2008, at 9:52 PM, <darren@ontrenet.com> wrote:

> Yes, return the document, but the problem is with SpanNearQuery it  
> does not return the documents I expect.
> Sorry I did not explain it well. Consider 2 documents each with  
> "word" fields.
>
> Document 1
>
> word: blue bird
> word: blue car
>
> Document 2
>
> word: sky blue
> word: sea blue
>
> I want to search for 'blue' and ONLY return Document 1 as I already  
> know
> that the term 'blue' MUST appear at the front of the field word:
>
> SpanNearQuery with slop of 0 or 1 won't do this if Document 1 has  
> other fields
> like this.
>
> Document 1 - IS NOT FOUND WITH SPAN NEAR 0 or 1
>
> word: some blue
> word: another blue
> word: blue bird
> word: blue car
>
> Expanding the Span slop to 3 will find Document 1 above this line,  
> however
> I thought the slop meant within the field terms. It seems to refer  
> to the list of fields rather than terms. This is unexpected behavior  
> to me. But I'm no lucene expert.
>
> Thanks for any thoughts.
>
> Darren
>
>
> darren@ontrenet.com wrote:
>> Thanks Paul. I will study your response more, as I don't fully  
>> understand it yet - specifically "You'll need to expand the prefix  
>> into indexed terms".
>>
>> But what I want to do is so simple I'm surprised it cannot be done.
>>
>> You are saying that I cannot find all fields across all documents  
>> that begin with a string or space bounded word? Consider 1 document  
>> with:
>>
>> word: blue car
>> word: red car
>> word: car door
>> word: car wheel
>>
>> Using whitespace analyzer I simply want to query all fields in all  
>> documents
>> where 'car' is the at the very front of the field.
>>
>> word: car door
>> word: car wheel
>>
>> This cannot be done? I don't want to retrieve all of them and prune  
>> the results myself because it will consume lots of resources.
>>
>> thanks so much!
>>
>> Darren
>>
>> On Sun Sep 14 16:36 , Paul Elschot  sent:Op Sunday 14 September  
>> 2008 19:36:38 schreef Darren Govoni:
>>
>>> Hi,
>>>  I am seeing odd behavior with SpanNearQuery.
>>>
>>> The problem is that with multiple fields, all fields beyond the  
>>> first
>>> one 'car' are not seen by the span. I didn't think the span meant to
>>> sets of the same field, but rather to terms within a given field.
>>>
>>> Document 1. 1 field (word)
>>>
>>> word: car
>>> word: cars
>>> word: cars wash
>>> word: cars lot
>>>
>>>
>>> SpanNearyQuery with slop of 0. Wrapped by SpanFirstQuery with slop  
>>> of
>>> 1. Term query within is "word","cars*". No results found.
>>>
>>
>> There is no SpanPrefixQuery for cars* in Lucene. You'll need to
>> expand the prefix into indexed terms to create a SpanOrQuery
>> yourself. This is fairly straightforward from PrefixQuery and
>> SpanOrQuery.
>> Alternatively, have a look at the surround query parser in contrib
>> for a working example.
>>
>> Regards,
>> Paul Elschot
>>
>>
>>> If I remove the first field word: car, it works. Also, if I increase
>>> the slop, it will return results from only the first amount of  
>>> fields
>>> in the slop rather than terms within the field value.
>>>
>>> Is what I am seeing the correct behavior? Doesn't seem like it.
>>>
>>> What I am trying to do is span _within_ EACH field and match phrases
>>> that begin with "cars*". Shouldn't be too hard to do I thought.
>>>
>>> Darren
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org