incubator-blur-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Garrett Barton <garrett.bar...@gmail.com>
Subject Re: Custom EdgeNGram Analyzer For Blur Text Field
Date Thu, 15 May 2014 15:21:04 GMT
I did not know you could just specify an analyzer, undocumented feature!

As far as the columnDefinition route, you probably extended the
CustomColumnDefinition which does not let you do normal queries, everything
has to get passed in as foo.bar:"someawesomeformatinquotes".  The reason is
Aaron has implemented some really cool things like geo-indexing with its
own custom syntax within a default lucene query.  If you go back and extend
TextColumnDefinition your analyzer would come out right when you queried it
normally.

I like the analyzer option personally for simple analyzers like yours, the
custom CD is overkill. Glad you got it working!

~Garrett


On Thu, May 15, 2014 at 12:33 AM, Dibyendu Bhattacharya <
dibyendu.bhattachary@gmail.com> wrote:

> HI ,
>
> Thanks to Aaron and Garret for your emails. I am able to  configure custom
> EdgeNGramAnalyzer for a text filed using "analyzerClass" property. I would
> request to kindly document it somewhere in Blur 0.2.2.
>
> I also tried the Custom Type Definition as suggested by Garret which use
> the EdgeNGramAnalyzer . This same type I defined in blur-site.xml and
> defined a ColumnDefinition which use this custom type. But while query I
> faced some issue as it prompt some message like ,  "filed is of custom type
> and needs to be enclosed in "" .
>
> Anyway, I found that "analyzerClass" option is much easier to configure and
> that worked fine for me.
>
> Regards,
> Dibyendu
>
>
>
> On Wed, May 14, 2014 at 7:33 AM, Aaron McCurry <amccurry@gmail.com> wrote:
>
> > Garrett is correct that you can create a custom type.  However you are
> > correct in that you can specify the "analyzerClass" property if and only
> if
> > there are one of two different types of constructors.  The default
> > constructor (no args) or one that takes the LuceneVersion enum.
>  Otherwise
> > it will throw an exception.  This also assumes that you are running a
> > fairly recent version of Blur if it's 0.2.2 (which I think you are) then
> > you are likely good to use that option.
> >
> > Here's the code:
> >
> >
> >
> https://git-wip-us.apache.org/repos/asf?p=incubator-blur.git;a=blob;f=blur-query/src/main/java/org/apache/blur/analysis/type/TextFieldTypeDefinition.java;h=049207bdb4f94cf03a4b0c74891eba129d13fbbb;hb=3967e154e7b064ad40b36d1d5832b7c7bcac44cd#l69
> >
> > Perhaps the reason it's not being taken is because the field has already
> > been defined for the given table?
> >
> > If none of those possibilities are the problem I'm not sure what the
> > problem could be.  Let us know how it goes.
> >
> > Aaron
> >
> >
> >
> >
> > On Tue, May 13, 2014 at 12:12 PM, Garrett Barton
> > <garrett.barton@gmail.com>wrote:
> >
> > > I think you have to create a custom TypeDefinition that calls your
> > > analyzer underneath the covers. You can extend the
> > TextFieldTypeDefinition
> > > if I remember right and just override the analyzer it calls.
> > >
> > > ~Garrett
> > >
> > >
> > > On Tue, May 13, 2014 at 11:54 AM, Dibyendu Bhattacharya <
> > > dibyendu.bhattachary@gmail.com> wrote:
> > >
> > >> Hi ,
> > >>
> > >> I was trying to configure a Custom Analyzer ( EgdeNGram) for a text
> > field.
> > >>
> > >> Below is the very simple Edge N Gram Analyzer code with works fine.
> > >>
> > >> public class EdgeNGramAnalyzer extends Analyzer {
> > >>  @Override
> > >> protected TokenStreamComponents createComponents(String fieldName,
> > Reader
> > >> reader) {
> > >>     final StandardTokenizer src = new
> > StandardTokenizer(Version.LUCENE_43,
> > >> reader);
> > >>     TokenStream tok = new StandardFilter(Version.LUCENE_43, src);
> > >>     tok = new LowerCaseFilter(Version.LUCENE_43, tok);
> > >>     tok = new StopFilter(Version.LUCENE_43, tok,
> > >> StopAnalyzer.ENGLISH_STOP_WORDS_SET);
> > >>     tok = new EdgeNGramTokenFilter(tok,
> > >> EdgeNGramTokenFilter.Side.FRONT,3,20);
> > >>     return new TokenStreamComponents(src, tok) {
> > >>       @Override
> > >>       protected void setReader(final Reader reader) throws
> IOException {
> > >>         super.setReader(reader);
> > >>       }
> > >>     };
> > >>   }
> > >> }
> > >>
> > >>
> > >> I configured this Analyzer for a CloumnDefination using following
> steps
> > >> via
> > >> thrift client..
> > >>
> > >>         ColumnDefinition customAnalyzerDefn = new ColumnDefinition();
> > >>         customAnalyzerDefn.setFamily(FAMILY_NAME);
> > >>         customAnalyzerDefn.setColumnName(COLUMN_NAME);
> > >>         customAnalyzerDefn.setFieldType("text");
> > >>
> > >>         Map<String,String> analyzer = new HashMap<String,String>();
> > >>         analyzer.put("analyzerClass", "x.y.z.EdgeNGramAnalyzer");
> > >>         customAnalyzerDefn.setProperties(analyzer);
> > >>
> > >>         client.addColumnDefinition(TABLE_NAME, customAnalyzerDefn);
> > >>
> > >>
> > >> I copied the Jar containing the analyzer class into Blur Lib folder.
> > >>
> > >> But I do not see this analyzer getting called. Blur always using the
> > >> default StandardAnalyzer for text field. Kindly let me know if I am
> > >> missing
> > >> something, or there is an issue that "analyzerClass" property is not
> > >> getting set. I found Blur using this key to set the Analyzer
> > >> in TextFieldTypeDefinition ..
> > >>
> > >> Regards,
> > >> Dibyendu
> > >>
> > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message