lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Erick Erickson" <erickerick...@gmail.com>
Subject Re: default AND operator
Date Sun, 17 Sep 2006 17:14:15 GMT
Well, I'm puzzled as well, in my simple examples I just ran, the AND
operator behaves just fine, but that was using StandardAnalyzer. So it's
almost certain we're not talking about the same thing <G>...

So, I guess I have a couple of suggestions:

1> Try your query without the stemmingAnalyzer. Try StandardAnalyzer (or
even WhitespaceAnalyzer) and kind of build up to the stemmer. That'll at
least narrow the problem space.

2> You might post more details about the stemmingAnalyzer you're using. It's
possible that there's some innocuous-seeming line in the creation of the
stemmingAnalyzer you're feeding into the query parser that's producing this
behavior. Parenthetically, I'm not entirely sure you're not going to get
into a heap o' trouble using a StandardAnalyzer to create the index then
using a stemmingAnalyzer to query it. But, as you say, that's secondary to
the default AND question. I should also add that I don't know enough about
stemming analyzers to put in a thimble, so this is just a theoretical
concern.

3> Create a small, self-contained program that demonstrates this issue and
post it here. Or, even better, a junit test <G>.

I think we've exhausted the generic issues you might be having and could get
a much faster resolution with a complete example to look at. "The guys" have
been generous with many posters in looking at actual code......

Best
Erick.

P.S. Please post whatever the resolution is, I'm pretty curious what you
find.

On 9/17/06, no spam <mrs.nospam@gmail.com> wrote:
>
> I am new to Lucene so I'll admit I am confused by a few things.  I'm using
> an index which was built with the StandardAnalyzer.  I have verified this
> by
> using an IndexReader to read the docs back out ... Antiques is not Antiq
> in
> the index.   So according to this note in the Lucene docs I would assume a
> Query parsed without a stemming analyzer would have matched:
>
> "Note: The analyzer used to create the index will be used on the terms and
> phrases in the query string. So it is important to choose an analyzer that
> will not interfere with the terms used in the query string."
>
> But it's quite the opposite, only a query parsed with the stemming
> analyzer
> is matching my queries.  So these are a few confusing issues which to me
> seem a *bit* beside the point ... perhaps I'm wrong.
>
> HOWEVER .. I'm still confused as to why the AND operator isn't matching my
> "french AND antiques" query regardless of the index.
>
> I will look into Luke ... thanks for your replies ... Mark
>
> On 9/17/06, Erick Erickson <erickerickson@gmail.com> wrote:
> >
> > Are you really, really sure that your *analyzer* isn't automatically
> > lower-casing your *query* and turning "french AND antiques" into "french
> > and
> > antiques", then, as Chris says, treating "and" as a stop word?
> >
> > The fact that your parser transforms "antiques" into "antiqu" leads me
> to
> > suspect that there's a lot more going on in the parser analyzer than you
> > might expect....
> >
> > And, in case you haven't already found it, are you sure what your index
> > contains. I've found luke (google luke lucene) to be very valuable for
> > these
> > kinds of questions, particularly your issue about stemming etc.
> >
> > Best
> > Erick
> >
> >
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message