lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Greg Huber <gregh3...@gmail.com>
Subject Re: Strange results returned from suggester
Date Sun, 29 Jan 2017 17:33:28 GMT
Uwe,

Perfect, exactly what I was looking for.  No duplication and no on going
maintenance (as using defaults) :-)

return CustomAnalyzer.builder()
.withTokenizer(StandardTokenizerFactory.class)
.addTokenFilter(StandardFilterFactory.class)
.addTokenFilter(LowerCaseFilterFactory.class)
.addTokenFilter(SuggestStopFilterFactory.class).build();

Thanks Greg.

On 29 January 2017 at 12:17, Uwe Schindler <uwe@thetaphi.de> wrote:

> Hi,
>
> CustomAnalyzer is a very generic thing. It has a builder that you can use
> to configure your analyzer. You can define which Tokenizer, which
> StopFilter (and pass stop words as you like), add stemming. No, it does not
> subclass StopWordAnalyzerBase, but that is also not needed, because it has
> a generic configuration interface.
>
> So I don't understand you problem. Lucene APIs take the abstract Analyzer
> class and CustomAnalyzer provides it the same like StandardAnalyzer.
> CustomAnalyzer is basically the same like Solr's schema.xml and
> Elasticsearch's analyzer index config.
>
> The first example in the Javadocs is more or less StandardAnalyzer, just
> adapt it and pass the factory:
> http://lucene.apache.org/core/6_4_0/analyzers-common/org/
> apache/lucene/analysis/custom/CustomAnalyzer.html
>
> Uwe
>
> -----
> Uwe Schindler
> Achterdiek 19, D-28357 Bremen
> http://www.thetaphi.de
> eMail: uwe@thetaphi.de
>
> > -----Original Message-----
> > From: Greg Huber [mailto:gregh3269@gmail.com]
> > Sent: Sunday, January 29, 2017 12:48 PM
> > To: java-user@lucene.apache.org
> > Subject: Re: Strange results returned from suggester
> >
> > Uwe,
> >
> > >...or use CustomAnalyzer then you don't need to
> > > subclass. Just decare the components.
> >
> > If I need the StandardAnalyzer code (marked final) and this extends
> > StopwordAnalyzerBase, how would I do this?
> >
> > Cheers Greg
> >
> > On 29 January 2017 at 11:32, Uwe Schindler <uwe@thetaphi.de> wrote:
> >
> > > ...or use CustomAnalyzer then you don't need to subclass. Just decare
> the
> > > components.
> > >
> > > Uwe
> > >
> > > -----
> > > Uwe Schindler
> > > Achterdiek 19, D-28357 Bremen
> > > http://www.thetaphi.de
> > > eMail: uwe@thetaphi.de
> > >
> > > > -----Original Message-----
> > > > From: Michael McCandless [mailto:lucene@mikemccandless.com]
> > > > Sent: Sunday, January 29, 2017 12:28 PM
> > > > To: Greg Huber <gregh3269@gmail.com>; Lucene Users <java-
> > > > user@lucene.apache.org>
> > > > Subject: Re: Strange results returned from suggester
> > > >
> > > > That's right, just make your own analyzer, forked from
> > > > StandardAnalyzer, and change out the StopFilter.  The analyzer is a
> > > > tiny class and this (creating your own components in an analyzers) is
> > > > normal practice...
> > > >
> > > > Mike McCandless
> > > >
> > > > http://blog.mikemccandless.com
> > > >
> > > >
> > > > On Sat, Jan 28, 2017 at 6:09 AM, Greg Huber <gregh3269@gmail.com>
> > wrote:
> > > > > Michael,
> > > > >
> > > > > Thanks for the update, so I just duplicate StandardAnalyzer and
> > > replace :
> > > > >
> > > > >
> > > > > //tok = new StopFilter(tok, stopwords);
> > > > >   tok = new SuggestStopFilter(tok, stopwords);
> > > > >
> > > > > in createComponents(..)
> > > > >
> > > > > Is there a way I can just override the method as in
> > > AnalyzingInfixSuggester
> > > > > rather than duplicating classes?
> > > > >
> > > > >
> > > > > Cheers Greg
> > > > >
> > > > > On 28 January 2017 at 10:31, Michael McCandless
> > > > <lucene@mikemccandless.com>
> > > > > wrote:
> > > > >>
> > > > >> Hi Greg,
> > > > >>
> > > > >> OK StandardAnalyzer does indeed use StopFilter, with English
stop
> > > > >> words by default, which includes "will", so this explains what
> you are
> > > > >> seeing.
> > > > >>
> > > > >> I suggest making your own analyzer just like StandardAnalyzer,
> except
> > > > >> instead of StopFilter use the SuggestStopFilter class.
> > > > >>
> > > > >> That class was created for exactly the situation you're in, so
> that
> > > > >> "will" would not be filtered out as a stop word, but "will "
is
> > > > >> (because it ends with a token separator).
> > > > >>
> > > > >> Either that or pass an empty stop word set to StandardAnalyzer,
> but
> > > > >> then you have no stop word filtering.
> > > > >>
> > > > >> This short blog post explains SuggestStopFilter:
> > > > >>
> > > > >> http://blog.mikemccandless.com/2013/08/suggeststopfilter-
> carefully-
> > > > removes.html
> > > > >>
> > > > >> Mike McCandless
> > > > >>
> > > > >> http://blog.mikemccandless.com
> > > > >>
> > > > >>
> > > > >> On Sat, Jan 28, 2017 at 3:39 AM, Greg Huber <gregh3269@gmail.com>
> > > > wrote:
> > > > >> > Michael,
> > > > >> >
> > > > >> > I am using the standard analyzer eith no stop words, and
is
> build
> > > from
> > > > >> > an
> > > > >> > existing lucene index.
> > > > >> >
> > > > >> > org.apache.lucene.search.suggest.analyzing.
> AnalyzingInfixSuggester
> > > > >> >
> > > > >> > I am overriding the addContextToQuery to make it an AND
rather
> > than
> > > > an
> > > > >> > OR
> > > > >> >
> > > > >> > public void addContextToQuery(Builder query, BytesRef context,
> > Occur
> > > > >> > clause)
> > > > >> > {
> > > > >> >         query.add(new TermQuery(new Term(CONTEXTS_FIELD_NAME,
> > > > context)),
> > > > >> >                 BooleanClause.Occur.MUST);
> > > > >> >     }
> > > > >> >
> > > > >> > Cheers Greg
> > > > >> >
> > > > >> > On 27 January 2017 at 18:20, Michael McCandless
> > > > >> > <lucene@mikemccandless.com>
> > > > >> > wrote:
> > > > >> >>
> > > > >> >> Which suggester are you using?
> > > > >> >>
> > > > >> >> Maybe you are using a suggester with an analyzer, and
your
> > analysis
> > > > >> >> chain includes a StopFilter and "will" is a stop word?
> > > > >> >>
> > > > >> >> Mike McCandless
> > > > >> >>
> > > > >> >> http://blog.mikemccandless.com
> > > > >> >>
> > > > >> >>
> > > > >> >> On Fri, Jan 27, 2017 at 10:42 AM, Greg Huber
> > <gregh3269@gmail.com>
> > > > >> >> wrote:
> > > > >> >> > Hello,
> > > > >> >> >
> > > > >> >> > Is there anyway to see why items are returned from
the
> > suggester?
> > > > >> >> > Similar
> > > > >> >> > to the search.
> > > > >> >> >
> > > > >> >> > I have a really strange case where if I enter 'will'
> (without the
> > > > >> >> > quotes)
> > > > >> >> > it seems to return all the search results.
> > > > >> >> >
> > > > >> >> > example:
> > > > >> >> >
> > > > >> >> > there should be two entries beginning with will*
 ie william
> and
> > > > >> >> > Willoughby
> > > > >> >> >
> > > > >> >> > wil >  two entries with correct highlight
> > > > >> >> > will > all entries with NO highlight
> > > > >> >> > willi > single entry
> > > > >> >> > willo > single entry
> > > > >> >> >
> > > > >> >> > I have checked and I do not have will on all the
entries!
> > > > >> >> >
> > > > >> >> > Cheers Greg
> > > > >> >
> > > > >> >
> > > > >
> > > > >
> > > >
> > > > ------------------------------------------------------------
> ---------
> > > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > > > For additional commands, e-mail: java-user-help@lucene.apache.org
> > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > > For additional commands, e-mail: java-user-help@lucene.apache.org
> > >
> > >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message