incubator-blur-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aaron McCurry <amccu...@gmail.com>
Subject Re: Custom EdgeNGram Analyzer For Blur Text Field
Date Thu, 15 May 2014 11:20:18 GMT


Sent from my iPad

On May 15, 2014, at 12:33 AM, Dibyendu Bhattacharya <dibyendu.bhattachary@gmail.com>
wrote:

> HI ,
> 
> Thanks to Aaron and Garret for your emails. I am able to  configure custom
> EdgeNGramAnalyzer for a text filed using "analyzerClass" property. I would
> request to kindly document it somewhere in Blur 0.2.2.

Last week I posted the docs for 0.2.2 on the website. Also the docs are embedded in the second
and binary artifacts. 

http://incubator.apache.org/blur/docs/0.2.2/data-model.html#text_type

I will the text for the constructor types with the docs. 

> 
> I also tried the Custom Type Definition as suggested by Garret which use
> the EdgeNGramAnalyzer . This same type I defined in blur-site.xml and
> defined a ColumnDefinition which use this custom type. But while query I
> faced some issue as it prompt some message like ,  "filed is of custom type
> and needs to be enclosed in "" .

Ok. Let us know if you need any help with the type code in the future. 
> 
> Anyway, I found that "analyzerClass" option is much easier to configure and
> that worked fine for me.

Glad it works for you. 

Aaron


> 
> Regards,
> Dibyendu
> 
> 
> 
> On Wed, May 14, 2014 at 7:33 AM, Aaron McCurry <amccurry@gmail.com> wrote:
> 
>> Garrett is correct that you can create a custom type.  However you are
>> correct in that you can specify the "analyzerClass" property if and only if
>> there are one of two different types of constructors.  The default
>> constructor (no args) or one that takes the LuceneVersion enum.  Otherwise
>> it will throw an exception.  This also assumes that you are running a
>> fairly recent version of Blur if it's 0.2.2 (which I think you are) then
>> you are likely good to use that option.
>> 
>> Here's the code:
>> 
>> 
>> https://git-wip-us.apache.org/repos/asf?p=incubator-blur.git;a=blob;f=blur-query/src/main/java/org/apache/blur/analysis/type/TextFieldTypeDefinition.java;h=049207bdb4f94cf03a4b0c74891eba129d13fbbb;hb=3967e154e7b064ad40b36d1d5832b7c7bcac44cd#l69
>> 
>> Perhaps the reason it's not being taken is because the field has already
>> been defined for the given table?
>> 
>> If none of those possibilities are the problem I'm not sure what the
>> problem could be.  Let us know how it goes.
>> 
>> Aaron
>> 
>> 
>> 
>> 
>> On Tue, May 13, 2014 at 12:12 PM, Garrett Barton
>> <garrett.barton@gmail.com>wrote:
>> 
>>> I think you have to create a custom TypeDefinition that calls your
>>> analyzer underneath the covers. You can extend the
>> TextFieldTypeDefinition
>>> if I remember right and just override the analyzer it calls.
>>> 
>>> ~Garrett
>>> 
>>> 
>>> On Tue, May 13, 2014 at 11:54 AM, Dibyendu Bhattacharya <
>>> dibyendu.bhattachary@gmail.com> wrote:
>>> 
>>>> Hi ,
>>>> 
>>>> I was trying to configure a Custom Analyzer ( EgdeNGram) for a text
>> field.
>>>> 
>>>> Below is the very simple Edge N Gram Analyzer code with works fine.
>>>> 
>>>> public class EdgeNGramAnalyzer extends Analyzer {
>>>> @Override
>>>> protected TokenStreamComponents createComponents(String fieldName,
>> Reader
>>>> reader) {
>>>>    final StandardTokenizer src = new
>> StandardTokenizer(Version.LUCENE_43,
>>>> reader);
>>>>    TokenStream tok = new StandardFilter(Version.LUCENE_43, src);
>>>>    tok = new LowerCaseFilter(Version.LUCENE_43, tok);
>>>>    tok = new StopFilter(Version.LUCENE_43, tok,
>>>> StopAnalyzer.ENGLISH_STOP_WORDS_SET);
>>>>    tok = new EdgeNGramTokenFilter(tok,
>>>> EdgeNGramTokenFilter.Side.FRONT,3,20);
>>>>    return new TokenStreamComponents(src, tok) {
>>>>      @Override
>>>>      protected void setReader(final Reader reader) throws IOException {
>>>>        super.setReader(reader);
>>>>      }
>>>>    };
>>>>  }
>>>> }
>>>> 
>>>> 
>>>> I configured this Analyzer for a CloumnDefination using following steps
>>>> via
>>>> thrift client..
>>>> 
>>>>        ColumnDefinition customAnalyzerDefn = new ColumnDefinition();
>>>>        customAnalyzerDefn.setFamily(FAMILY_NAME);
>>>>        customAnalyzerDefn.setColumnName(COLUMN_NAME);
>>>>        customAnalyzerDefn.setFieldType("text");
>>>> 
>>>>        Map<String,String> analyzer = new HashMap<String,String>();
>>>>        analyzer.put("analyzerClass", "x.y.z.EdgeNGramAnalyzer");
>>>>        customAnalyzerDefn.setProperties(analyzer);
>>>> 
>>>>        client.addColumnDefinition(TABLE_NAME, customAnalyzerDefn);
>>>> 
>>>> 
>>>> I copied the Jar containing the analyzer class into Blur Lib folder.
>>>> 
>>>> But I do not see this analyzer getting called. Blur always using the
>>>> default StandardAnalyzer for text field. Kindly let me know if I am
>>>> missing
>>>> something, or there is an issue that "analyzerClass" property is not
>>>> getting set. I found Blur using this key to set the Analyzer
>>>> in TextFieldTypeDefinition ..
>>>> 
>>>> Regards,
>>>> Dibyendu
>>>> 
>>> 
>>> 
>> 

Mime
  • Unnamed multipart/alternative (inline, 7-Bit, 0 bytes)
View raw message