lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zhang, Lisheng" <Lisheng.Zh...@broadvision.com>
Subject RE: Keyword query confusion
Date Wed, 29 Sep 2004 16:05:46 GMT
Hi,

Erik and others mentioned that is_pub:1 won't
work because of Analyzer, but I remember in 
my test StandardAnalyzer does not take number
away, but SimpleAnalyzer does.

According to previous mail it is the Standard
Analyzer being used here, how could the number 
"1" is parsed away?

I used lucene 1.4, rc3.

Thanks very much for helps, 

Lisheng

-----Original Message-----
From: Erik Hatcher [mailto:erik@hatcher.net]
Sent: Saturday, September 25, 2004 1:59 AM
To: Lucene Users List
Subject: Re: Keyword query confusion


On Sep 24, 2004, at 12:26 PM, Fred Toth wrote:
> I'm trying to understand what's going on with the query parser
> and keyword fields.

It's a confusing situation, for sure.

> I've got a large subset of my documents which are "publications".
> So as to be able to query these, I've got this in the indexer:
>
> doc.add(Field.Keyword("is_pub", "1"));
>
> However, if I run a query:
>
> 	is_pub:1
>
> I get no hits. If I find a document by other means and dump the
> fields, the "is_pub" keyword is there, with value of "1".

As already stated - it is the analyzer eating the "1".  Every field is 
analyzed by QueryParser, but during indexing Field.Keyword fields are 
not indexed.

Search the archives for discussion on a KeywordAnalyzer and how to use 
it with PerFieldAnalyzerWrapper.  Also, the info here is valuable:

	http://wiki.apache.org/jakarta-lucene/AnalysisParalysis

Visualizing what an analyzer does and using Query.toString are both 
techniques to clearly point out what is happening.

> Now, I've learned that if I change the field to contain the value 
> "true"
> instead of the string "1", this query:
>
> 	is_pub:true
>
> works just fine.
>
> So, I'm pretty sure I'm running afoul of the analyzer, right? The doc 
> says
> specifically that I should add keyword query clauses programmatically,
> and I'm guessing that's what's wrong.

It really depends on your needs.  I personally wouldn't want end-users 
knowing to type "is_pub:true" into a search box.  Designing the most 
appropriate search interface for your situation is highly recommended.  
And in this case a checkbox for "Is published?" that translates into a 
TermQuery behind the scenes (likely combined with other pieces, perhaps 
a QueryParser parsed piece, using BooleanQuery).  TermQuery text is not 
analyzed, so you'd be safe there.

> But can someone explain this? It sure is useful to be able to test this
> sort of thing with the query parser. What is going on with the standard
> analyzer that makes "true" work and "1" not work?

Numbers get axed, that is what happens.

> Is there a way around this other than by writing code to create the
> query? This also applies to other types of query, like "pub_date:2004".

A PerFieldAnalyzerWrapper using WhitespaceAnalyzer for the "is_pub" 
field would do the trick in this case.

Again, users typing "pub_date:2004" seems awkward to me - make a year 
drop-down box if they need to select a year.

> Hoping for enlightenment...

Now that's a tall order... or is it?!  It's surrounding us all - we 
simply have to breath it in.

	Erik


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message