lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doron Cohen <DOR...@il.ibm.com>
Subject Re: np-pandock search problem (again, with more detail)
Date Thu, 07 Jun 2007 20:21:22 GMT
Michael D. Curtin wrote:

> > Np-pandock-L1
> > Np-pandock-L2
>
> I'm not positive, but I think StandardAnalyzer splits this input at the
> hyphens.  That is, it gives the terms "Np", "pandock", "1", "2", "L",
> "L1", and "L2", but NOT "Np-pandoc", etc.

I think it splits by hyphens unless the no-hyphen
part has digits, so:
  np-pandock-a7
becomes
  np
  pandock-a7
This is for the indexing part.
For the query part, prefixQuery is not subject to standard analysis
(just lower casing), so you could be even more surprised that
this query:
  Np-pandock-L2
would find that document (it would become a phrase
query "np pandock-l2"), but this query:
  Np-pandock-L2*
would not find any document, because it would become
a prefix query
  np-pandock-l2*
and at indexing such token (np-pandock-l2) was never created.

Using Luke you should be able to see the tokens in the
index as well as how the query is parsed (under
query details).

Doron


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message