Return-Path:
Delivered-To: apmail-jakarta-lucene-user-archive@apache.org
Received: (qmail 30638 invoked from network); 19 Jun 2002 14:39:18 -0000
Received: from unknown (HELO nagoya.betaversion.org) (192.18.49.131)
by daedalus.apache.org with SMTP; 19 Jun 2002 14:39:18 -0000
Received: (qmail 2206 invoked by uid 97); 19 Jun 2002 14:38:55 -0000
Delivered-To: qmlist-jakarta-archive-lucene-user@jakarta.apache.org
Received: (qmail 1999 invoked by uid 97); 19 Jun 2002 14:38:54 -0000
Mailing-List: contact lucene-user-help@jakarta.apache.org; run by ezmlm
Precedence: bulk
List-Unsubscribe:
List-Subscribe:
List-Help:
List-Post:
List-Id: "Lucene Users List"
Reply-To: "Lucene Users List"
Delivered-To: mailing list lucene-user@jakarta.apache.org
Received: (qmail 1836 invoked by uid 98); 19 Jun 2002 14:38:53 -0000
X-Antivirus: nagoya (v4198 created Apr 24 2002)
Message-ID: <050101c2179d$7c0e2000$0201a8c0@netframe.com>
From: "Terry Steichen"
To: "Lucene Users Group"
References:
Subject: Re: Peculiar Behavior with Field queries
Date: Wed, 19 Jun 2002 10:27:46 -0400
MIME-Version: 1.0
Content-Type: text/plain;
charset="iso-8859-1"
Content-Transfer-Encoding: 7bit
X-Priority: 3
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook Express 5.50.4522.1200
X-MimeOLE: Produced By Microsoft MimeOLE V5.50.4522.1200
X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N
X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N
Peter,
Enclosed is an xml file which reflects the structure of the documents I
index. Note that it has a 'headline' field. In my WPDocument class (used
by the indexer), I parse this xml file into its components and insert them
as Fields into the Document class. Specifically, I put the contents of the
'headline' xml field into a Field called "headline" and also into a Field
called "l_headline". The former is stored, indexed and tokenized. The
latter is stored, indexed and *not* tokenized.
Upon retrieval, I am able to readily display both the "headline" and
"l_headline" fields. But I am able to search *only* on the headline field.
(BTW, I realize that I must include the entire, literal headline to match
"l_headline".)
As long as I'm mentioning problems/observations, I find that I am able to
search on all fields (other than the 'l_headline' field) using the "*"
wildcard - but *only* when the preceding letter is lower case. For example,
I have another field called "category" and one such value is "NAT". I can
match this with "category:NAT", "category:nat", or "category:n*". But I
cannot match with "category:N*".
Also, while the "*" wildcard works fine (at the end and/or in the middle of
a term), the '?' wildcard doesn't work at all.
Regards,
Terry
PS: I am using the StandardAnalyzer and QueryParser that comes with Lucene
1.2rc5.
------------ Example XML file that I index --------------------
The Knockout Paunch
Peter Piper
FAT
20020616
Post
This Father's Day, let us praise Dad by celebrating
that ever-expanding, much-maligned monument to the good life that he always
carries close to his heart -- his paunch, his shelf, his spare tire, his
front porch, his Buddha, his bay window, his beer gut, his
potbelly.
]]>
This Father's Day, let us praise Dad by
celebrating that ever-expanding, much-maligned monument to the good life
that he always carries close to his heart -- his paunch, his shelf, his
spare tire, his front porch, his Buddha, his bay window, his beer gut, his
potbelly. The potbelly is the essence of distilled Dadness. It's as
much a part of the architecture of middle-aged masculinity as creaky knees
or hairy ears or the bald spot that keeps growing, wiping out wilderness
faster than the Sahara.
---Stuff snipped for brevity --
What does the perfect potbelly say?
"It says, 'God, that guy's got
a great beer gut,' " Decaire declares. "I saw a guy with a great gut in the
store today. He had on a Hawaiian shirt and white shorts. The Hawaiian shirt
just gave great form to his gut, the way a good bra gives form to breasts.
It was just perfect. It was holding itself up -- nothing was hanging over
the belt. I said, 'Great gut.' He said, 'Thanks.'
"It was
beautiful."
]]>
A51288-2002Jun14
----- Original Message -----
From: "Peter Carlson"
To: "Lucene Users List"
Sent: Wednesday, June 19, 2002 9:47 AM
Subject: Re: Peculiar Behavior with Field queries
> Terry,
>
> Please provide the exact example of the text so we can look at it and
> evaluate what's going on.
>
> -Peter
>
>
> On 6/19/02 5:20 AM, "Terry Steichen" wrote:
>
> > Peter,
> >
> > I added a new field called 'l_headline' (for literal headline) which I
set
> > so it was searchable and included in the index and not tokenized. But
the
> > query (using a phrase that is an exact match for the headline, but which
may
> > include stop words) still fails. Even when I apply this to an article
whose
> > headline contains no stop words (so the headline:"phrase"' returns the
> > article), the 'l_headline' fails to produce anything.
> >
> > I can do a 'doc.get("l_headline")' and it shows the proper phrase has
been
> > included.
> >
> > Any ideas why this won't let me do a literal match? Seems like it
should
> > work fine.
> >
> > Regards,
> >
> > Terry
>
>
> --
> To unsubscribe, e-mail:
> For additional commands, e-mail:
>
>
--
To unsubscribe, e-mail:
For additional commands, e-mail: