lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Murdoch, Paul" <PAUL.B.MURD...@saic.com>
Subject RE: Phrase search on NOT_ANALYZED content
Date Thu, 04 Mar 2010 13:35:29 GMT
I'm using NOT_ANALYZED because I have a list of text items to index
where some of the items are single words and some of the items are two
words or more with punctuation.  My problem is that sometimes one of the
words in a item with two or more words matches one of the single text
items.  That sounds confusing so here is an example of the list:

info:name
info:name, paul
info:name, unknown
info:unknown

So here I have four items in the list. If I index this ANALYZED and do a
search for "info:unknown", I will get two hits where I only wanted one.
If I do a search for "info:name" I will get three hits where I only
wanted one.  I index using the StandardAnalyzer.  I also index
everything as ANALYZED into a "content" field for broad searches.  That
functions as expected.  This is a specialized search where the user can
search on field name (in this case "info").  The specialized search is
not restricted to exact matches so the KeywordAnalyzer is out.  I'm
trying to allow searching on terms, phrases, and ranges as well as
allowing fuzzy, wildcard, slop, and boost modifiers.  I'm making use of
the TermQuery, PhraseQuery, FuzzyQuery, WildcardQuery, TermRangeQuery
and BooleanQuery.  Using these classes produces better results than
simply feeding a Query object to the QueryParser.  Everything works as
expected except for phrases because of the NOT_ANALYZED flag.  Is there
a KeywordQuery class?  I think that would solve my problem.

Thanks,   

Paul 

-----Original Message-----
From: java-user-return-45278-PAUL.B.MURDOCH=saic.com@lucene.apache.org
[mailto:java-user-return-45278-PAUL.B.MURDOCH=saic.com@lucene.apache.org
] On Behalf Of Erick Erickson
Sent: Wednesday, March 03, 2010 4:30 PM
To: java-user@lucene.apache.org
Subject: Re: Phrase search on NOT_ANALYZED content

NOT_ANALYZED is probably not what you want.
NOT_ANALYZED stores the entire input as
a *single* token, so you can never match on
anything except the entire input.

What did you hope to accomplish by indexint
NOT_ANALYZED? That's actually a pretty
specialized thing to do, perhaps there's a better
way to accomplish your goal.

Best
Erick

On Wed, Mar 3, 2010 at 4:11 PM, Murdoch, Paul
<PAUL.B.MURDOCH@saic.com>wrote:

> If I have indexed some content that contains some words and a single
> whitespace between each word as NOT_ANALYZED, is it possible to
perform
> a phrase search on that a portion of that content?  I'm indexing and
> searching with the StandardAnalyzer 2.9.  Using the KeywordAnalyzer
> works, but I have to type in the entire content as it was indexed.  I
> would like to search for a phrase in the content, but not all of the
> content, preferably with the StandardAnalyzer.
>
>
>
> Thanks,
>
>
>
> Paul
>
>
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message