lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Peter Carlson <carl...@bookandhammer.com>
Subject Re: Peculiar Behavior with Field queries
Date Mon, 17 Jun 2002 14:32:30 GMT
You could do this, but you would also have to match case exactly.

--Peter


On 6/17/02 7:04 AM, "Terry Steichen" <terry@net-frame.com> wrote:

> Could you handle this by including extra fields?  Let's say we're dealing
> with a database of articles, and that we wanted to do regular as well as
> literal phrase searches on the 'headline' field.  What would happen if,
> during indexing, you created a second headline field which you defined as
> searchable but *not* tokenized.  Could you then apply the phrase query
> (which included stop words) successfully against the second headline field
> (recognizing that you would have to match the second headline's text
> exactly)?
> 
> Terry
> 
> ----- Original Message -----
> From: "Peter Carlson" <carlson@bookandhammer.com>
> To: "Lucene Users List" <lucene-user@jakarta.apache.org>
> Sent: Monday, June 17, 2002 10:03 AM
> Subject: Re: Peculiar Behavior with Field queries
> 
> 
>> I don't think there is a work around without eliminating stop words,
> because
>> when you index, you will not include the stop words in the index.
>> 
>> I guess one option would be to create an Analyzer to use when creating the
>> index that would not eliminate the stop words, then a change the
>> QueryParser.jj to use this analyzer when searching for phrases.
>> For all other queries you could use a different analyzer that would
>> eliminate the stop words.
>> 
>> I don't find this a problem personally as long as you tell the person that
>> you have eliminated these terms from what they are searching for. As an
>> example, in Google they tell you which terms were just common words that
>> have been eliminated from your query string.
>> 
>> --Peter
>> 
>> On 6/17/02 5:32 AM, "Terry Steichen" <terry@net-frame.com> wrote:
>> 
>>> This apparent inability of Lucene to find articles containing literal
>>> phrases (if the phrase contains stop words) can be, I think, a severe
>>> limitation.  I wonder if there is any workaround (short of eliminating
> stop
>>> words)?
>>> 
>>> Terry
>>> 
>>> ----- Original Message -----
>>> From: "Otis Gospodnetic" <otis_gospodnetic@yahoo.com>
>>> To: "Lucene Users List" <lucene-user@jakarta.apache.org>
>>> Sent: Sunday, June 16, 2002 11:04 PM
>>> Subject: Re: Peculiar Behavior with Field queries
>>> 
>>> 
>>>> I believe you are correct.
>>>> 
>>>> istO tisO sitO itSo Otsi Osit Otis
>>>> 
>>>> --- Terry Steichen <terry@net-frame.com> wrote:
>>>>> Oits,
>>>>> 
>>>>> You may be right that the stop words are at work here.  What I was
>>>>> expecting
>>>>> to do is be able to match a specific phrase ("on the job"), even if
>>>>> it
>>>>> includes stop words.  But I guess that may not be the way that Lucene
>>>>> works,
>>>>> right?
>>>>> 
>>>>> Regards,
>>>>> 
>>>>> Terry
>>>>> 
>>>>> ----- Original Message -----
>>>>> From: "Otis Gospodnetic" <otis_gospodnetic@yahoo.com>
>>>>> To: "Lucene Users List" <lucene-user@jakarta.apache.org>
>>>>> Sent: Sunday, June 16, 2002 7:13 PM
>>>>> Subject: Re: Peculiar Behavior with Field queries
>>>>> 
>>>>> 
>>>>>> Hello,
>>>>>> 
>>>>>> I'm not sure what the problem is :)
>>>>>> What do you expect to get back?
>>>>>> Are you wondering why 'on the' part is not matched?
>>>>>> If so, it's probably because both 'on' and 'the' are in the list
of
>>>>>> stop words, which are thrown out when/before indexing.
>>>>>> 
>>>>>> Otis
>>>>>> 
>>>>>> --- Terry Steichen <terry@net-frame.com> wrote:
>>>>>>> Hello,
>>>>>>> 
>>>>>>> I'm using Lucene (1.2RC5) and, when indexing, I include a field
>>>>>>> called "headline" using the following line of code in the
>>>>> document I
>>>>>>> create to use for indexing:
>>>>>>> 
>>>>>>>       addField("headline", root.elementText("headline"), true,
>>>>> true,
>>>>>>> true, doc);
>>>>>>> 
>>>>>>> When I search on headline:term1, it works just fine.  But I've
>>>>>>> noticed that if I query using, for example,
>>>>>>> 
>>>>>>>         headline:"on the job"
>>>>>>> 
>>>>>>> I will get returned all items that have the term 'job' in their
>>>>>>> headline.
>>>>>>> 
>>>>>>> I presume I've overlooked something and would appreciate any
>>>>>>> suggestions on what that might be.
>>>>>>> 
>>>>>>> Regards,
>>>>>>> 
>>>>>>> Terry Steichen
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> __________________________________________________
>>>>>> Do You Yahoo!?
>>>>>> Yahoo! - Official partner of 2002 FIFA World Cup
>>>>>> http://fifaworldcup.yahoo.com
>>>>>> 
>>>>>> --
>>>>>> To unsubscribe, e-mail:
>>>>> <mailto:lucene-user-unsubscribe@jakarta.apache.org>
>>>>>> For additional commands, e-mail:
>>>>> <mailto:lucene-user-help@jakarta.apache.org>
>>>>>> 
>>>>> 
>>>>> 
>>>>> --
>>>>> To unsubscribe, e-mail:
>>>>> <mailto:lucene-user-unsubscribe@jakarta.apache.org>
>>>>> For additional commands, e-mail:
>>>>> <mailto:lucene-user-help@jakarta.apache.org>
>>>>> 
>>>> 
>>>> 
>>>> __________________________________________________
>>>> Do You Yahoo!?
>>>> Yahoo! - Official partner of 2002 FIFA World Cup
>>>> http://fifaworldcup.yahoo.com
>>>> 
>>>> --
>>>> To unsubscribe, e-mail:
>>> <mailto:lucene-user-unsubscribe@jakarta.apache.org>
>>>> For additional commands, e-mail:
>>> <mailto:lucene-user-help@jakarta.apache.org>
>>>> 
>>> 
>>> 
>>> --
>>> To unsubscribe, e-mail:
> <mailto:lucene-user-unsubscribe@jakarta.apache.org>
>>> For additional commands, e-mail:
> <mailto:lucene-user-help@jakarta.apache.org>
>>> 
>>> 
>> 
>> 
>> --
>> To unsubscribe, e-mail:
> <mailto:lucene-user-unsubscribe@jakarta.apache.org>
>> For additional commands, e-mail:
> <mailto:lucene-user-help@jakarta.apache.org>
>> 
>> 
> 
> 
> --
> To unsubscribe, e-mail:   <mailto:lucene-user-unsubscribe@jakarta.apache.org>
> For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>
> 
> 


--
To unsubscribe, e-mail:   <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>


Mime
View raw message