lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Erick Erickson" <erickerick...@gmail.com>
Subject Re: Re : Re: Re : Re: Question concerning Analyzers
Date Thu, 08 Feb 2007 17:51:59 GMT
Well, you've proved that your problem is that the analyzer you're using when
querying isn't matching what you use during indexing. I think that what
you've done will lead you into significant problems down the road as your
tokenizer then has to "correct" for what the index analyzer did though.

What would probably be MUCH less work in the long run is to align the
analyzer you use at query time with the analyzer you use at index time. You
can use a PerFieldAnalyzerWrapper to handle different fields in different
ways. Forget your custom tokenizer for the time being, just try using the
same analyzer during searching that you used during indexing. You can use
the
*QueryParser<file:///C:/lucene-2.0.0/docs/api/org/apache/lucene/queryParser/QueryParser.html#QueryParser%28java.lang.String,%20org.apache.lucene.analysis.Analyzer%29>
*(String <http://java.sun.com/j2se/1.4/docs/api/java/lang/String.html> f,
Analyzer<file:///C:/lucene-2.0.0/docs/api/org/apache/lucene/analysis/Analyzer.html>
 a)

form of the QueryParser, where the Analyzer is the same one you used when
indexing. There are some circumstances where you want to use different
analyzers when querying and when indexing, but don't go there unless you
need to <G>....

If that doesn't do what you want, I'd really recommend is that you make your
own custom Analyzer, built on, say, WhitespaceTokenizer, LowerCaseFilter.
This is usually the way I've approached this kind of problem. And use *that*
one at index and query time.

There's an example in Lucene In Action, see the SynonymAnalyzer example.
That example is MUCH more complex than you'll need <G>...

Best
Erick

On 2/8/07, Xavier To <to.xavier@courrier.uqam.ca> wrote:
>
> Hey !
>
> I tried using WhitespaceAnalyzer during the search and it works. I
> refactored the tokenizing process so it uses TokenStream instead of
> StringTokenizer and it works fine for one thing : the query "this is a test"
> becomes "thisisatest". I fixed it by adding a space after each token except
> for the last one, but is there a clean way to do it ? I'm using
> WhitespaceTokenizer.
>
> Thanks a bunch !
>
> Xavier Tô
> Bacc. en Informatique et Génie Logiciel
> to.xavier@courrier.uqam.ca
> (450)434-8905
>
> ----- Message d'origine -----
> De: Erick Erickson <erickerickson@gmail.com>
> Date: Mercredi, Février 7, 2007 4:28 pm
> Objet: Re: Re : Re: Question concerning Analyzers
>
> > Then the analyzer you're using when parsing the query is stripping
> > them. It
> > must be different than the one you use when indexing somehow. At least
> > that's the only explanation I can imagine....
> >
> > Perhaps, somehow, you are using a default analyzer when you parse a
> > query?Or you aren't specifying the field when you query and thus a
> > default is
> > used? Or you are using a PerFieldAnalyzerWrapper and dropping
> > through to the
> > default? or ????
> >
> > Just for yucks, I'd try using WhitespaceAnalyzer on a query with
> > somethingyou *know* exists in the index for a particular field and
> > work my way up to
> > whatever your real problem is in small steps (since you can't post
> > code<G>)......
> >
> > Best
> > Erick
> >
> > On 2/7/07, Xavier To <to.xavier@courrier.uqam.ca> wrote:
> > >
> > > Thanks Erik and Erick,
> > >
> > > I guess my question was rather unclear, but you guys answered it
> > all the
> > > same : it is impossible for an analyzer to index something and
> > having the
> > > same analyzer ignore the thing indexed during a search.
> > >
> > > If it makes everything clearer, during indexation, numbers  are
> > indexed,> whether or not they are accompanied by letters ( 2003 and
> > 4wd are both
> > > indexed). That's fine, since we want this.  The problem occurs
> > when I try to
> > > search for them : They are ignored. I know they are indexed
> > because I ran
> > > through the index using Luke.
> > >
> > > Any thoughts regarding this problem ?
> > >
> > > Xavier Tô
> > > Bacc. en Informatique et Génie Logiciel
> > > to.xavier@courrier.uqam.ca
> > > (450)434-8905
> > >
> > > ----- Message d'origine -----
> > > De: Erik Hatcher <erik@ehatchersolutions.com>
> > > Date: Mercredi, Février 7, 2007 3:15 pm
> > > Objet: Re: Question concerning Analyzers
> > >
> > > > There is no requirement that you use the same analyzer to
> > search as
> > > >
> > > > you used to index.  So, yes, you could certainly index things and
> > > > ignore them during a search.
> > > >
> > > >       Erik
> > > >
> > > >
> > > > On Feb 7, 2007, at 2:10 PM, Xavier To wrote:
> > > >
> > > > > Hi, me again
> > > > >
> > > > > I'm still stuck with my search engine, but something popped
> > in my
> > > >
> > > > > head : Can an analyzer index something but ignore it during a
> > > > > search ? I'm asking this because now that I've been searching
> > for> >
> > > > > an answer, I've come to think that I should redo the whole
> > search> >
> > > > > engine, but I don't want to reproduce the same error as we have
> > > > > now. It would be stupid to accidentaly redo the same mistake. I
> > > > > still haven't received news from my seniors about me posting
> > code> >
> > > > > and all...
> > > > >
> > > > > Xavier Tô
> > > > > Bacc. en Informatique et Génie Logiciel
> > > > > to.xavier@courrier.uqam.ca
> > > > > (450)434-8905
> > > > >
> > > > >
> > > > >
> > > > > --------------------------------------------------------------
> > ----
> > > > ---
> > > > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > > > > For additional commands, e-mail: java-user-
> > help@lucene.apache.org> >
> > > >
> > > > ----------------------------------------------------------------
> > ----
> > > > -
> > > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > > > For additional commands, e-mail: java-user-help@lucene.apache.org
> > > >
> > > >
> > >
> > >
> > > ------------------------------------------------------------------
> > ---
> > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > > For additional commands, e-mail: java-user-help@lucene.apache.org
> > >
> > >
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message