lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Terry Steichen" <te...@net-frame.com>
Subject Re: Peculiar Behavior with Field queries
Date Wed, 19 Jun 2002 15:24:43 GMT
Peter,

1) I was using precisely that spelling of the search string, misspellings
and case matched.

2) I dumped the Query.toString() and it showed that the entered term ("The
Knockout Paunch") was converted to lower case (l_headline:"the knockout
paunch").  So, I just tried modifying WPDocument so that when indexing, the
contents of 'l_headline' would be processed/saved as lowercase.  Didn't
change anything - still didn't match.

3) Why doesn't the '?' wildcard work?

4) Also, (related to 3 above) how does Lucene choose which type of query to
employ?  I've tried examining the contents of QueryParser.jj, but don't
really understand it's structure.

Regards,

Terry

----- Original Message -----
From: "Peter Carlson" <carlson@bookandhammer.com>
To: "Lucene Users List" <lucene-user@jakarta.apache.org>
Sent: Wednesday, June 19, 2002 11:06 AM
Subject: Re: Peculiar Behavior with Field queries


> So just to be clear, the search string you are using is exactly
>
> L_headline:"The Knockout Paunch"
>
> Note the misspelling of Punch and the case sensitive specifics.
>
> If this doesn't work, please output the results of the Query object you
> create. That is Query.toString([defaultField]).
>
>
> Also, for the wildcard issue, this is an FAQ. The wildcard query does not
> tokenize the query term and there for it does not lower case the "N".
Since
> you used the standard tokenizer, all terms are lower case.
>
>
> --Peter
>
>
>
> On 6/19/02 7:27 AM, "Terry Steichen" <terry@net-frame.com> wrote:
>
> > Peter,
> >
> > Enclosed is an xml file which reflects the structure of the documents I
> > index.  Note that it has a 'headline' field.  In my WPDocument class
(used
> > by the indexer), I parse this xml file into its components and insert
them
> > as Fields into the Document class.  Specifically, I put the contents of
the
> > 'headline' xml field into a Field called "headline" and also into a
Field
> > called "l_headline".  The former is stored, indexed and tokenized.  The
> > latter is stored, indexed and *not* tokenized.
> >
> > Upon retrieval, I am able to readily display both the "headline" and
> > "l_headline" fields.  But I am able to search *only* on the headline
field.
> > (BTW, I realize  that I must include the entire, literal headline to
match
> > "l_headline".)
> >
> > As long as I'm mentioning problems/observations, I find that I am able
to
> > search on all fields (other than the 'l_headline' field) using the "*"
> > wildcard - but *only* when the preceding letter is lower case.  For
example,
> > I have another field called "category" and one such value is "NAT".  I
can
> > match this with "category:NAT", "category:nat", or "category:n*".  But I
> > cannot match with "category:N*".
> >
> > Also, while the "*" wildcard works fine (at the end and/or in the middle
of
> > a term), the '?' wildcard doesn't work at all.
> >
> > Regards,
> >
> > Terry
> >
> > PS: I am using the StandardAnalyzer and QueryParser that comes with
Lucene
> > 1.2rc5.
> >
> > ------------ Example XML file that I index --------------------
> > <?xml version="1.0" encoding="iso-8859-1"?>
> >
> > <article>
> > <headline>The Knockout Paunch</headline>
> > <author>Peter Piper</author>
> > <category>FAT</category>
> > <pub_date create_date="20020616" event_date="20020616" timestamp="22:23
> > PM">20020616</pub_date>
> > <placement edition="EE" section="EZ" page="F01 " slug="POTBELLIES16"/>
> > <origin sourcenumber="6">Post</origin>
> > <webexec created="Mon Jun 17 23:15:33 EDT 2002" module="v_wp13"/>
> > <summary><![CDATA[<p>This Father's Day, let us praise Dad by celebrating
> > that ever-expanding, much-maligned monument to the good life that he
always
> > carries close to his heart -- his paunch, his shelf, his spare tire, his
> > front porch, his Buddha, his bay window, his beer gut, his
> > potbelly.</p>]]></summary>
> > <body paras="74"><![CDATA[ <p>This Father's Day, let us praise Dad
by
> > celebrating that ever-expanding, much-maligned monument to the good life
> > that he always carries close to his heart -- his paunch, his shelf, his
> > spare tire, his front porch, his Buddha, his bay window, his beer gut,
his
> > potbelly.</p> <p>The potbelly is the essence of distilled Dadness. It's
as
> > much a part of the architecture of middle-aged masculinity as creaky
knees
> > or hairy ears or the bald spot that keeps growing, wiping out wilderness
> > faster than the Sahara.</p>
> >
> > ---Stuff snipped for brevity --
> >
> > <p>What does the perfect potbelly say?</p> <p>"It says, 'God,
that guy's
got
> > a great beer gut,' " Decaire declares. "I saw a guy with a great gut in
the
> > store today. He had on a Hawaiian shirt and white shorts. The Hawaiian
shirt
> > just gave great form to his gut, the way a good bra gives form to
breasts.
> > It was just perfect. It was holding itself up -- nothing was hanging
over
> > the belt. I said, 'Great gut.' He said, 'Thanks.'</p> <p>"It was
> > beautiful."</p>]]></body>
> > <doc_name>A51288-2002Jun14</doc_name>
> > <references>
> >   <ref_articles>
> >     <ref_article/>
> >   </ref_articles>
> >   <urls>
> >     <url/>
> >   </urls>
> >   <graphics>
> >     <graphic/>
> >   </graphics>
> > </references>
> > </article>
> >
> >
> > ----- Original Message -----
> > From: "Peter Carlson" <carlson@bookandhammer.com>
> > To: "Lucene Users List" <lucene-user@jakarta.apache.org>
> > Sent: Wednesday, June 19, 2002 9:47 AM
> > Subject: Re: Peculiar Behavior with Field queries
> >
> >
> >> Terry,
> >>
> >> Please provide the exact example of the text so we can look at it and
> >> evaluate what's going on.
> >>
> >> -Peter
> >>
> >>
> >> On 6/19/02 5:20 AM, "Terry Steichen" <terry@net-frame.com> wrote:
> >>
> >>> Peter,
> >>>
> >>> I added a new field called 'l_headline' (for literal headline) which I
> > set
> >>> so it was searchable and included in the index and not tokenized.  But
> > the
> >>> query (using a phrase that is an exact match for the headline, but
which
> > may
> >>> include stop words) still fails.  Even when I apply this to an article
> > whose
> >>> headline contains no stop words (so the headline:"phrase"' returns the
> >>> article), the 'l_headline' fails to produce anything.
> >>>
> >>> I can do a 'doc.get("l_headline")' and it shows the proper phrase has
> > been
> >>> included.
> >>>
> >>> Any ideas why this won't let me do a literal match?  Seems like it
> > should
> >>> work fine.
> >>>
> >>> Regards,
> >>>
> >>> Terry
> >>
> >>
> >> --
> >> To unsubscribe, e-mail:
> > <mailto:lucene-user-unsubscribe@jakarta.apache.org>
> >> For additional commands, e-mail:
> > <mailto:lucene-user-help@jakarta.apache.org>
> >>
> >>
> >
> >
> > --
> > To unsubscribe, e-mail:
<mailto:lucene-user-unsubscribe@jakarta.apache.org>
> > For additional commands, e-mail:
<mailto:lucene-user-help@jakarta.apache.org>
> >
> >
>
>
> --
> To unsubscribe, e-mail:
<mailto:lucene-user-unsubscribe@jakarta.apache.org>
> For additional commands, e-mail:
<mailto:lucene-user-help@jakarta.apache.org>
>
>


--
To unsubscribe, e-mail:   <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>


Mime
View raw message