lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Peter Carlson <carl...@bookandhammer.com>
Subject Re: Peculiar Behavior with Field queries
Date Wed, 19 Jun 2002 15:06:55 GMT
So just to be clear, the search string you are using is exactly

L_headline:"The Knockout Paunch"

Note the misspelling of Punch and the case sensitive specifics.

If this doesn't work, please output the results of the Query object you
create. That is Query.toString([defaultField]).


Also, for the wildcard issue, this is an FAQ. The wildcard query does not
tokenize the query term and there for it does not lower case the "N". Since
you used the standard tokenizer, all terms are lower case.


--Peter



On 6/19/02 7:27 AM, "Terry Steichen" <terry@net-frame.com> wrote:

> Peter,
> 
> Enclosed is an xml file which reflects the structure of the documents I
> index.  Note that it has a 'headline' field.  In my WPDocument class (used
> by the indexer), I parse this xml file into its components and insert them
> as Fields into the Document class.  Specifically, I put the contents of the
> 'headline' xml field into a Field called "headline" and also into a Field
> called "l_headline".  The former is stored, indexed and tokenized.  The
> latter is stored, indexed and *not* tokenized.
> 
> Upon retrieval, I am able to readily display both the "headline" and
> "l_headline" fields.  But I am able to search *only* on the headline field.
> (BTW, I realize  that I must include the entire, literal headline to match
> "l_headline".)
> 
> As long as I'm mentioning problems/observations, I find that I am able to
> search on all fields (other than the 'l_headline' field) using the "*"
> wildcard - but *only* when the preceding letter is lower case.  For example,
> I have another field called "category" and one such value is "NAT".  I can
> match this with "category:NAT", "category:nat", or "category:n*".  But I
> cannot match with "category:N*".
> 
> Also, while the "*" wildcard works fine (at the end and/or in the middle of
> a term), the '?' wildcard doesn't work at all.
> 
> Regards,
> 
> Terry
> 
> PS: I am using the StandardAnalyzer and QueryParser that comes with Lucene
> 1.2rc5.
> 
> ------------ Example XML file that I index --------------------
> <?xml version="1.0" encoding="iso-8859-1"?>
> 
> <article>
> <headline>The Knockout Paunch</headline>
> <author>Peter Piper</author>
> <category>FAT</category>
> <pub_date create_date="20020616" event_date="20020616" timestamp="22:23
> PM">20020616</pub_date>
> <placement edition="EE" section="EZ" page="F01 " slug="POTBELLIES16"/>
> <origin sourcenumber="6">Post</origin>
> <webexec created="Mon Jun 17 23:15:33 EDT 2002" module="v_wp13"/>
> <summary><![CDATA[<p>This Father's Day, let us praise Dad by celebrating
> that ever-expanding, much-maligned monument to the good life that he always
> carries close to his heart -- his paunch, his shelf, his spare tire, his
> front porch, his Buddha, his bay window, his beer gut, his
> potbelly.</p>]]></summary>
> <body paras="74"><![CDATA[ <p>This Father's Day, let us praise Dad by
> celebrating that ever-expanding, much-maligned monument to the good life
> that he always carries close to his heart -- his paunch, his shelf, his
> spare tire, his front porch, his Buddha, his bay window, his beer gut, his
> potbelly.</p> <p>The potbelly is the essence of distilled Dadness. It's as
> much a part of the architecture of middle-aged masculinity as creaky knees
> or hairy ears or the bald spot that keeps growing, wiping out wilderness
> faster than the Sahara.</p>
> 
> ---Stuff snipped for brevity --
> 
> <p>What does the perfect potbelly say?</p> <p>"It says, 'God, that
guy's got
> a great beer gut,' " Decaire declares. "I saw a guy with a great gut in the
> store today. He had on a Hawaiian shirt and white shorts. The Hawaiian shirt
> just gave great form to his gut, the way a good bra gives form to breasts.
> It was just perfect. It was holding itself up -- nothing was hanging over
> the belt. I said, 'Great gut.' He said, 'Thanks.'</p> <p>"It was
> beautiful."</p>]]></body>
> <doc_name>A51288-2002Jun14</doc_name>
> <references>
>   <ref_articles>
>     <ref_article/>
>   </ref_articles>
>   <urls>
>     <url/>
>   </urls>
>   <graphics>
>     <graphic/>
>   </graphics>
> </references>
> </article>
> 
> 
> ----- Original Message -----
> From: "Peter Carlson" <carlson@bookandhammer.com>
> To: "Lucene Users List" <lucene-user@jakarta.apache.org>
> Sent: Wednesday, June 19, 2002 9:47 AM
> Subject: Re: Peculiar Behavior with Field queries
> 
> 
>> Terry,
>> 
>> Please provide the exact example of the text so we can look at it and
>> evaluate what's going on.
>> 
>> -Peter
>> 
>> 
>> On 6/19/02 5:20 AM, "Terry Steichen" <terry@net-frame.com> wrote:
>> 
>>> Peter,
>>> 
>>> I added a new field called 'l_headline' (for literal headline) which I
> set
>>> so it was searchable and included in the index and not tokenized.  But
> the
>>> query (using a phrase that is an exact match for the headline, but which
> may
>>> include stop words) still fails.  Even when I apply this to an article
> whose
>>> headline contains no stop words (so the headline:"phrase"' returns the
>>> article), the 'l_headline' fails to produce anything.
>>> 
>>> I can do a 'doc.get("l_headline")' and it shows the proper phrase has
> been
>>> included.
>>> 
>>> Any ideas why this won't let me do a literal match?  Seems like it
> should
>>> work fine.
>>> 
>>> Regards,
>>> 
>>> Terry
>> 
>> 
>> --
>> To unsubscribe, e-mail:
> <mailto:lucene-user-unsubscribe@jakarta.apache.org>
>> For additional commands, e-mail:
> <mailto:lucene-user-help@jakarta.apache.org>
>> 
>> 
> 
> 
> --
> To unsubscribe, e-mail:   <mailto:lucene-user-unsubscribe@jakarta.apache.org>
> For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>
> 
> 


--
To unsubscribe, e-mail:   <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>


Mime
View raw message