lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Terry Steichen" <te...@net-frame.com>
Subject Re: Peculiar Behavior with Field queries
Date Wed, 19 Jun 2002 14:27:46 GMT
Peter,

Enclosed is an xml file which reflects the structure of the documents I
index.  Note that it has a 'headline' field.  In my WPDocument class (used
by the indexer), I parse this xml file into its components and insert them
as Fields into the Document class.  Specifically, I put the contents of the
'headline' xml field into a Field called "headline" and also into a Field
called "l_headline".  The former is stored, indexed and tokenized.  The
latter is stored, indexed and *not* tokenized.

Upon retrieval, I am able to readily display both the "headline" and
"l_headline" fields.  But I am able to search *only* on the headline field.
(BTW, I realize  that I must include the entire, literal headline to match
"l_headline".)

As long as I'm mentioning problems/observations, I find that I am able to
search on all fields (other than the 'l_headline' field) using the "*"
wildcard - but *only* when the preceding letter is lower case.  For example,
I have another field called "category" and one such value is "NAT".  I can
match this with "category:NAT", "category:nat", or "category:n*".  But I
cannot match with "category:N*".

Also, while the "*" wildcard works fine (at the end and/or in the middle of
a term), the '?' wildcard doesn't work at all.

Regards,

Terry

PS: I am using the StandardAnalyzer and QueryParser that comes with Lucene
1.2rc5.

------------ Example XML file that I index --------------------
<?xml version="1.0" encoding="iso-8859-1"?>

<article>
  <headline>The Knockout Paunch</headline>
  <author>Peter Piper</author>
  <category>FAT</category>
  <pub_date create_date="20020616" event_date="20020616" timestamp="22:23
PM">20020616</pub_date>
  <placement edition="EE" section="EZ" page="F01 " slug="POTBELLIES16"/>
  <origin sourcenumber="6">Post</origin>
  <webexec created="Mon Jun 17 23:15:33 EDT 2002" module="v_wp13"/>
  <summary><![CDATA[<p>This Father's Day, let us praise Dad by celebrating
that ever-expanding, much-maligned monument to the good life that he always
carries close to his heart -- his paunch, his shelf, his spare tire, his
front porch, his Buddha, his bay window, his beer gut, his
potbelly.</p>]]></summary>
  <body paras="74"><![CDATA[ <p>This Father's Day, let us praise Dad by
celebrating that ever-expanding, much-maligned monument to the good life
that he always carries close to his heart -- his paunch, his shelf, his
spare tire, his front porch, his Buddha, his bay window, his beer gut, his
potbelly.</p> <p>The potbelly is the essence of distilled Dadness. It's as
much a part of the architecture of middle-aged masculinity as creaky knees
or hairy ears or the bald spot that keeps growing, wiping out wilderness
faster than the Sahara.</p>

---Stuff snipped for brevity --

<p>What does the perfect potbelly say?</p> <p>"It says, 'God, that guy's
got
a great beer gut,' " Decaire declares. "I saw a guy with a great gut in the
store today. He had on a Hawaiian shirt and white shorts. The Hawaiian shirt
just gave great form to his gut, the way a good bra gives form to breasts.
It was just perfect. It was holding itself up -- nothing was hanging over
the belt. I said, 'Great gut.' He said, 'Thanks.'</p> <p>"It was
beautiful."</p>]]></body>
  <doc_name>A51288-2002Jun14</doc_name>
  <references>
    <ref_articles>
      <ref_article/>
    </ref_articles>
    <urls>
      <url/>
    </urls>
    <graphics>
      <graphic/>
    </graphics>
  </references>
</article>


----- Original Message -----
From: "Peter Carlson" <carlson@bookandhammer.com>
To: "Lucene Users List" <lucene-user@jakarta.apache.org>
Sent: Wednesday, June 19, 2002 9:47 AM
Subject: Re: Peculiar Behavior with Field queries


> Terry,
>
> Please provide the exact example of the text so we can look at it and
> evaluate what's going on.
>
> -Peter
>
>
> On 6/19/02 5:20 AM, "Terry Steichen" <terry@net-frame.com> wrote:
>
> > Peter,
> >
> > I added a new field called 'l_headline' (for literal headline) which I
set
> > so it was searchable and included in the index and not tokenized.  But
the
> > query (using a phrase that is an exact match for the headline, but which
may
> > include stop words) still fails.  Even when I apply this to an article
whose
> > headline contains no stop words (so the headline:"phrase"' returns the
> > article), the 'l_headline' fails to produce anything.
> >
> > I can do a 'doc.get("l_headline")' and it shows the proper phrase has
been
> > included.
> >
> > Any ideas why this won't let me do a literal match?  Seems like it
should
> > work fine.
> >
> > Regards,
> >
> > Terry
>
>
> --
> To unsubscribe, e-mail:
<mailto:lucene-user-unsubscribe@jakarta.apache.org>
> For additional commands, e-mail:
<mailto:lucene-user-help@jakarta.apache.org>
>
>


--
To unsubscribe, e-mail:   <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>


Mime
View raw message