incubator-blur-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aaron McCurry <amccu...@gmail.com>
Subject Re: Custom EdgeNGram Analyzer For Blur Text Field
Date Wed, 14 May 2014 02:03:18 GMT
Garrett is correct that you can create a custom type.  However you are
correct in that you can specify the "analyzerClass" property if and only if
there are one of two different types of constructors.  The default
constructor (no args) or one that takes the LuceneVersion enum.  Otherwise
it will throw an exception.  This also assumes that you are running a
fairly recent version of Blur if it's 0.2.2 (which I think you are) then
you are likely good to use that option.

Here's the code:

https://git-wip-us.apache.org/repos/asf?p=incubator-blur.git;a=blob;f=blur-query/src/main/java/org/apache/blur/analysis/type/TextFieldTypeDefinition.java;h=049207bdb4f94cf03a4b0c74891eba129d13fbbb;hb=3967e154e7b064ad40b36d1d5832b7c7bcac44cd#l69

Perhaps the reason it's not being taken is because the field has already
been defined for the given table?

If none of those possibilities are the problem I'm not sure what the
problem could be.  Let us know how it goes.

Aaron




On Tue, May 13, 2014 at 12:12 PM, Garrett Barton
<garrett.barton@gmail.com>wrote:

> I think you have to create a custom TypeDefinition that calls your
> analyzer underneath the covers. You can extend the TextFieldTypeDefinition
> if I remember right and just override the analyzer it calls.
>
> ~Garrett
>
>
> On Tue, May 13, 2014 at 11:54 AM, Dibyendu Bhattacharya <
> dibyendu.bhattachary@gmail.com> wrote:
>
>> Hi ,
>>
>> I was trying to configure a Custom Analyzer ( EgdeNGram) for a text field.
>>
>> Below is the very simple Edge N Gram Analyzer code with works fine.
>>
>> public class EdgeNGramAnalyzer extends Analyzer {
>>  @Override
>> protected TokenStreamComponents createComponents(String fieldName, Reader
>> reader) {
>>     final StandardTokenizer src = new StandardTokenizer(Version.LUCENE_43,
>> reader);
>>     TokenStream tok = new StandardFilter(Version.LUCENE_43, src);
>>     tok = new LowerCaseFilter(Version.LUCENE_43, tok);
>>     tok = new StopFilter(Version.LUCENE_43, tok,
>> StopAnalyzer.ENGLISH_STOP_WORDS_SET);
>>     tok = new EdgeNGramTokenFilter(tok,
>> EdgeNGramTokenFilter.Side.FRONT,3,20);
>>     return new TokenStreamComponents(src, tok) {
>>       @Override
>>       protected void setReader(final Reader reader) throws IOException {
>>         super.setReader(reader);
>>       }
>>     };
>>   }
>> }
>>
>>
>> I configured this Analyzer for a CloumnDefination using following steps
>> via
>> thrift client..
>>
>>         ColumnDefinition customAnalyzerDefn = new ColumnDefinition();
>>         customAnalyzerDefn.setFamily(FAMILY_NAME);
>>         customAnalyzerDefn.setColumnName(COLUMN_NAME);
>>         customAnalyzerDefn.setFieldType("text");
>>
>>         Map<String,String> analyzer = new HashMap<String,String>();
>>         analyzer.put("analyzerClass", "x.y.z.EdgeNGramAnalyzer");
>>         customAnalyzerDefn.setProperties(analyzer);
>>
>>         client.addColumnDefinition(TABLE_NAME, customAnalyzerDefn);
>>
>>
>> I copied the Jar containing the analyzer class into Blur Lib folder.
>>
>> But I do not see this analyzer getting called. Blur always using the
>> default StandardAnalyzer for text field. Kindly let me know if I am
>> missing
>> something, or there is an issue that "analyzerClass" property is not
>> getting set. I found Blur using this key to set the Analyzer
>> in TextFieldTypeDefinition ..
>>
>> Regards,
>> Dibyendu
>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message