lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Terry Steichen" <te...@net-frame.com>
Subject Re: Peculiar Behavior with Field queries
Date Wed, 19 Jun 2002 12:20:25 GMT
Peter,

I added a new field called 'l_headline' (for literal headline) which I set
so it was searchable and included in the index and not tokenized.  But the
query (using a phrase that is an exact match for the headline, but which may
include stop words) still fails.  Even when I apply this to an article whose
headline contains no stop words (so the headline:"phrase"' returns the
article), the 'l_headline' fails to produce anything.

I can do a 'doc.get("l_headline")' and it shows the proper phrase has been
included.

Any ideas why this won't let me do a literal match?  Seems like it should
work fine.

Regards,

Terry


----- Original Message -----
From: "Peter Carlson" <carlson@bookandhammer.com>
To: "Lucene Users List" <lucene-user@jakarta.apache.org>
Sent: Monday, June 17, 2002 10:32 AM
Subject: Re: Peculiar Behavior with Field queries


> You could do this, but you would also have to match case exactly.
>
> --Peter
>
>
> On 6/17/02 7:04 AM, "Terry Steichen" <terry@net-frame.com> wrote:
>
> > Could you handle this by including extra fields?  Let's say we're
dealing
> > with a database of articles, and that we wanted to do regular as well as
> > literal phrase searches on the 'headline' field.  What would happen if,
> > during indexing, you created a second headline field which you defined
as
> > searchable but *not* tokenized.  Could you then apply the phrase query
> > (which included stop words) successfully against the second headline
field
> > (recognizing that you would have to match the second headline's text
> > exactly)?
> >
> > Terry
> >
> > ----- Original Message -----
> > From: "Peter Carlson" <carlson@bookandhammer.com>
> > To: "Lucene Users List" <lucene-user@jakarta.apache.org>
> > Sent: Monday, June 17, 2002 10:03 AM
> > Subject: Re: Peculiar Behavior with Field queries
> >
> >
> >> I don't think there is a work around without eliminating stop words,
> > because
> >> when you index, you will not include the stop words in the index.
> >>
> >> I guess one option would be to create an Analyzer to use when creating
the
> >> index that would not eliminate the stop words, then a change the
> >> QueryParser.jj to use this analyzer when searching for phrases.
> >> For all other queries you could use a different analyzer that would
> >> eliminate the stop words.
> >>
> >> I don't find this a problem personally as long as you tell the person
that
> >> you have eliminated these terms from what they are searching for. As an
> >> example, in Google they tell you which terms were just common words
that
> >> have been eliminated from your query string.
> >>
> >> --Peter
> >>
> >> On 6/17/02 5:32 AM, "Terry Steichen" <terry@net-frame.com> wrote:
> >>
> >>> This apparent inability of Lucene to find articles containing literal
> >>> phrases (if the phrase contains stop words) can be, I think, a severe
> >>> limitation.  I wonder if there is any workaround (short of eliminating
> > stop
> >>> words)?
> >>>
> >>> Terry
> >>>
> >>> ----- Original Message -----
> >>> From: "Otis Gospodnetic" <otis_gospodnetic@yahoo.com>
> >>> To: "Lucene Users List" <lucene-user@jakarta.apache.org>
> >>> Sent: Sunday, June 16, 2002 11:04 PM
> >>> Subject: Re: Peculiar Behavior with Field queries
> >>>
> >>>
> >>>> I believe you are correct.
> >>>>
> >>>> istO tisO sitO itSo Otsi Osit Otis
> >>>>
> >>>> --- Terry Steichen <terry@net-frame.com> wrote:
> >>>>> Oits,
> >>>>>
> >>>>> You may be right that the stop words are at work here.  What I was
> >>>>> expecting
> >>>>> to do is be able to match a specific phrase ("on the job"), even
if
> >>>>> it
> >>>>> includes stop words.  But I guess that may not be the way that
Lucene
> >>>>> works,
> >>>>> right?
> >>>>>
> >>>>> Regards,
> >>>>>
> >>>>> Terry
> >>>>>
> >>>>> ----- Original Message -----
> >>>>> From: "Otis Gospodnetic" <otis_gospodnetic@yahoo.com>
> >>>>> To: "Lucene Users List" <lucene-user@jakarta.apache.org>
> >>>>> Sent: Sunday, June 16, 2002 7:13 PM
> >>>>> Subject: Re: Peculiar Behavior with Field queries
> >>>>>
> >>>>>
> >>>>>> Hello,
> >>>>>>
> >>>>>> I'm not sure what the problem is :)
> >>>>>> What do you expect to get back?
> >>>>>> Are you wondering why 'on the' part is not matched?
> >>>>>> If so, it's probably because both 'on' and 'the' are in the
list of
> >>>>>> stop words, which are thrown out when/before indexing.
> >>>>>>
> >>>>>> Otis
> >>>>>>
> >>>>>> --- Terry Steichen <terry@net-frame.com> wrote:
> >>>>>>> Hello,
> >>>>>>>
> >>>>>>> I'm using Lucene (1.2RC5) and, when indexing, I include
a field
> >>>>>>> called "headline" using the following line of code in the
> >>>>> document I
> >>>>>>> create to use for indexing:
> >>>>>>>
> >>>>>>>       addField("headline", root.elementText("headline"),
true,
> >>>>> true,
> >>>>>>> true, doc);
> >>>>>>>
> >>>>>>> When I search on headline:term1, it works just fine.  But
I've
> >>>>>>> noticed that if I query using, for example,
> >>>>>>>
> >>>>>>>         headline:"on the job"
> >>>>>>>
> >>>>>>> I will get returned all items that have the term 'job' in
their
> >>>>>>> headline.
> >>>>>>>
> >>>>>>> I presume I've overlooked something and would appreciate
any
> >>>>>>> suggestions on what that might be.
> >>>>>>>
> >>>>>>> Regards,
> >>>>>>>
> >>>>>>> Terry Steichen
> >>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>>
> >>>>>> __________________________________________________
> >>>>>> Do You Yahoo!?
> >>>>>> Yahoo! - Official partner of 2002 FIFA World Cup
> >>>>>> http://fifaworldcup.yahoo.com
> >>>>>>
> >>>>>> --
> >>>>>> To unsubscribe, e-mail:
> >>>>> <mailto:lucene-user-unsubscribe@jakarta.apache.org>
> >>>>>> For additional commands, e-mail:
> >>>>> <mailto:lucene-user-help@jakarta.apache.org>
> >>>>>>
> >>>>>
> >>>>>
> >>>>> --
> >>>>> To unsubscribe, e-mail:
> >>>>> <mailto:lucene-user-unsubscribe@jakarta.apache.org>
> >>>>> For additional commands, e-mail:
> >>>>> <mailto:lucene-user-help@jakarta.apache.org>
> >>>>>
> >>>>
> >>>>
> >>>> __________________________________________________
> >>>> Do You Yahoo!?
> >>>> Yahoo! - Official partner of 2002 FIFA World Cup
> >>>> http://fifaworldcup.yahoo.com
> >>>>
> >>>> --
> >>>> To unsubscribe, e-mail:
> >>> <mailto:lucene-user-unsubscribe@jakarta.apache.org>
> >>>> For additional commands, e-mail:
> >>> <mailto:lucene-user-help@jakarta.apache.org>
> >>>>
> >>>
> >>>
> >>> --
> >>> To unsubscribe, e-mail:
> > <mailto:lucene-user-unsubscribe@jakarta.apache.org>
> >>> For additional commands, e-mail:
> > <mailto:lucene-user-help@jakarta.apache.org>
> >>>
> >>>
> >>
> >>
> >> --
> >> To unsubscribe, e-mail:
> > <mailto:lucene-user-unsubscribe@jakarta.apache.org>
> >> For additional commands, e-mail:
> > <mailto:lucene-user-help@jakarta.apache.org>
> >>
> >>
> >
> >
> > --
> > To unsubscribe, e-mail:
<mailto:lucene-user-unsubscribe@jakarta.apache.org>
> > For additional commands, e-mail:
<mailto:lucene-user-help@jakarta.apache.org>
> >
> >
>
>
> --
> To unsubscribe, e-mail:
<mailto:lucene-user-unsubscribe@jakarta.apache.org>
> For additional commands, e-mail:
<mailto:lucene-user-help@jakarta.apache.org>
>
>


--
To unsubscribe, e-mail:   <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>


Mime
View raw message