lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: exact query match?
Date Thu, 18 Mar 2010 00:58:23 GMT
You might get some joy from WhitespaceAnalyzer, but beware of case and
punctuation. You could pre-process your indexing and querying to remove
non-alphanumerics.

Or you could create your own analyzer, see SynonymAnalyzer in Lucene In
Action, and there's another example here: http://mext.at/?p=26.

The idea is to string together some number of Filters, starting with a
Tokenizer that "does the right thing",  and create your own Analyzer.

But as far as I know, there's nothing out of the box that does what you
want.

Best
Erick

On Wed, Mar 17, 2010 at 4:25 PM, Joachim De Beule <joachim@arti.vub.ac.be>wrote:

> Hi All,
>
> I have a corpus of documents which I want to search for phrases. I only
> want
> to get those documents that exactly contain a phrase. for example if:
> doc1 = "x 11 windowing system"
> doc2 = "x windowing system"
> doc3 = "the x 11 windowing system"
>
> then I want the query "x 11 windowing system" to return only doc1 and doc3
> and
> the query "the x 11" to return only doc3.
>
> I have tried to use SimpleAnalyzer together with using the query as a
> single
> phrase, but this still also gives doc2 for the first example query because
> this
> analyzer discards the number 11. There does not seem to be an alternative
> analyzer for this however, and I don't know how to write one myself.
>
> Is there a standard way of doing this?
>
> Thanks!
>
> Joachim.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message