lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Peter Carlson <carl...@bookandhammer.com>
Subject Re: Peculiar Behavior with Field queries
Date Thu, 20 Jun 2002 04:51:10 GMT
Hi Terry,

This is a strange. I'll have to check it out.
BTW, the best way to debug the QueryParser and learn it is to look at the
generated QueryParser.java file. It is in the bin directory if you build
Lucene from scratch.

--Peter


On 6/19/02 8:24 AM, "Terry Steichen" <terry@net-frame.com> wrote:

> Peter,
> 
> 1) I was using precisely that spelling of the search string, misspellings
> and case matched.
> 
> 2) I dumped the Query.toString() and it showed that the entered term ("The
> Knockout Paunch") was converted to lower case (l_headline:"the knockout
> paunch").  So, I just tried modifying WPDocument so that when indexing, the
> contents of 'l_headline' would be processed/saved as lowercase.  Didn't
> change anything - still didn't match.
> 
> 3) Why doesn't the '?' wildcard work?
> 
> 4) Also, (related to 3 above) how does Lucene choose which type of query to
> employ?  I've tried examining the contents of QueryParser.jj, but don't
> really understand it's structure.
> 
> Regards,
> 
> Terry
> 
> ----- Original Message -----
> From: "Peter Carlson" <carlson@bookandhammer.com>
> To: "Lucene Users List" <lucene-user@jakarta.apache.org>
> Sent: Wednesday, June 19, 2002 11:06 AM
> Subject: Re: Peculiar Behavior with Field queries
> 
> 
>> So just to be clear, the search string you are using is exactly
>> 
>> L_headline:"The Knockout Paunch"
>> 
>> Note the misspelling of Punch and the case sensitive specifics.
>> 
>> If this doesn't work, please output the results of the Query object you
>> create. That is Query.toString([defaultField]).
>> 
>> 
>> Also, for the wildcard issue, this is an FAQ. The wildcard query does not
>> tokenize the query term and there for it does not lower case the "N".
> Since
>> you used the standard tokenizer, all terms are lower case.
>> 
>> 
>> --Peter
>> 
>> 
>> 
>> On 6/19/02 7:27 AM, "Terry Steichen" <terry@net-frame.com> wrote:
>> 
>>> Peter,
>>> 
>>> Enclosed is an xml file which reflects the structure of the documents I
>>> index.  Note that it has a 'headline' field.  In my WPDocument class
> (used
>>> by the indexer), I parse this xml file into its components and insert
> them
>>> as Fields into the Document class.  Specifically, I put the contents of
> the
>>> 'headline' xml field into a Field called "headline" and also into a
> Field
>>> called "l_headline".  The former is stored, indexed and tokenized.  The
>>> latter is stored, indexed and *not* tokenized.
>>> 
>>> Upon retrieval, I am able to readily display both the "headline" and
>>> "l_headline" fields.  But I am able to search *only* on the headline
> field.
>>> (BTW, I realize  that I must include the entire, literal headline to
> match
>>> "l_headline".)
>>> 
>>> As long as I'm mentioning problems/observations, I find that I am able
> to
>>> search on all fields (other than the 'l_headline' field) using the "*"
>>> wildcard - but *only* when the preceding letter is lower case.  For
> example,
>>> I have another field called "category" and one such value is "NAT".  I
> can
>>> match this with "category:NAT", "category:nat", or "category:n*".  But I
>>> cannot match with "category:N*".
>>> 
>>> Also, while the "*" wildcard works fine (at the end and/or in the middle
> of
>>> a term), the '?' wildcard doesn't work at all.
>>> 
>>> Regards,
>>> 
>>> Terry
>>> 
>>> PS: I am using the StandardAnalyzer and QueryParser that comes with
> Lucene
>>> 1.2rc5.
>>> 
>>> ------------ Example XML file that I index --------------------
>>> <?xml version="1.0" encoding="iso-8859-1"?>
>>> 
>>> <article>
>>> <headline>The Knockout Paunch</headline>
>>> <author>Peter Piper</author>
>>> <category>FAT</category>
>>> <pub_date create_date="20020616" event_date="20020616" timestamp="22:23
>>> PM">20020616</pub_date>
>>> <placement edition="EE" section="EZ" page="F01 " slug="POTBELLIES16"/>
>>> <origin sourcenumber="6">Post</origin>
>>> <webexec created="Mon Jun 17 23:15:33 EDT 2002" module="v_wp13"/>
>>> <summary><![CDATA[<p>This Father's Day, let us praise Dad by celebrating
>>> that ever-expanding, much-maligned monument to the good life that he
> always
>>> carries close to his heart -- his paunch, his shelf, his spare tire, his
>>> front porch, his Buddha, his bay window, his beer gut, his
>>> potbelly.</p>]]></summary>
>>> <body paras="74"><![CDATA[ <p>This Father's Day, let us praise
Dad by
>>> celebrating that ever-expanding, much-maligned monument to the good life
>>> that he always carries close to his heart -- his paunch, his shelf, his
>>> spare tire, his front porch, his Buddha, his bay window, his beer gut,
> his
>>> potbelly.</p> <p>The potbelly is the essence of distilled Dadness.
It's
> as
>>> much a part of the architecture of middle-aged masculinity as creaky
> knees
>>> or hairy ears or the bald spot that keeps growing, wiping out wilderness
>>> faster than the Sahara.</p>
>>> 
>>> ---Stuff snipped for brevity --
>>> 
>>> <p>What does the perfect potbelly say?</p> <p>"It says, 'God,
that guy's
> got
>>> a great beer gut,' " Decaire declares. "I saw a guy with a great gut in
> the
>>> store today. He had on a Hawaiian shirt and white shorts. The Hawaiian
> shirt
>>> just gave great form to his gut, the way a good bra gives form to
> breasts.
>>> It was just perfect. It was holding itself up -- nothing was hanging
> over
>>> the belt. I said, 'Great gut.' He said, 'Thanks.'</p> <p>"It was
>>> beautiful."</p>]]></body>
>>> <doc_name>A51288-2002Jun14</doc_name>
>>> <references>
>>>   <ref_articles>
>>>     <ref_article/>
>>>   </ref_articles>
>>>   <urls>
>>>     <url/>
>>>   </urls>
>>>   <graphics>
>>>     <graphic/>
>>>   </graphics>
>>> </references>
>>> </article>
>>> 
>>> 
>>> ----- Original Message -----
>>> From: "Peter Carlson" <carlson@bookandhammer.com>
>>> To: "Lucene Users List" <lucene-user@jakarta.apache.org>
>>> Sent: Wednesday, June 19, 2002 9:47 AM
>>> Subject: Re: Peculiar Behavior with Field queries
>>> 
>>> 
>>>> Terry,
>>>> 
>>>> Please provide the exact example of the text so we can look at it and
>>>> evaluate what's going on.
>>>> 
>>>> -Peter
>>>> 
>>>> 
>>>> On 6/19/02 5:20 AM, "Terry Steichen" <terry@net-frame.com> wrote:
>>>> 
>>>>> Peter,
>>>>> 
>>>>> I added a new field called 'l_headline' (for literal headline) which
I
>>> set
>>>>> so it was searchable and included in the index and not tokenized.  But
>>> the
>>>>> query (using a phrase that is an exact match for the headline, but
> which
>>> may
>>>>> include stop words) still fails.  Even when I apply this to an article
>>> whose
>>>>> headline contains no stop words (so the headline:"phrase"' returns the
>>>>> article), the 'l_headline' fails to produce anything.
>>>>> 
>>>>> I can do a 'doc.get("l_headline")' and it shows the proper phrase has
>>> been
>>>>> included.
>>>>> 
>>>>> Any ideas why this won't let me do a literal match?  Seems like it
>>> should
>>>>> work fine.
>>>>> 
>>>>> Regards,
>>>>> 
>>>>> Terry
>>>> 
>>>> 
>>>> --
>>>> To unsubscribe, e-mail:
>>> <mailto:lucene-user-unsubscribe@jakarta.apache.org>
>>>> For additional commands, e-mail:
>>> <mailto:lucene-user-help@jakarta.apache.org>
>>>> 
>>>> 
>>> 
>>> 
>>> --
>>> To unsubscribe, e-mail:
> <mailto:lucene-user-unsubscribe@jakarta.apache.org>
>>> For additional commands, e-mail:
> <mailto:lucene-user-help@jakarta.apache.org>
>>> 
>>> 
>> 
>> 
>> --
>> To unsubscribe, e-mail:
> <mailto:lucene-user-unsubscribe@jakarta.apache.org>
>> For additional commands, e-mail:
> <mailto:lucene-user-help@jakarta.apache.org>
>> 
>> 
> 
> 
> --
> To unsubscribe, e-mail:   <mailto:lucene-user-unsubscribe@jakarta.apache.org>
> For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>
> 
> 


--
To unsubscribe, e-mail:   <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>


Mime
View raw message