lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alan Woodward <alan.woodw...@romseysoftware.co.uk>
Subject Re: Which token filter can combine 2 terms into 1?
Date Fri, 21 Dec 2012 09:00:50 GMT
Have a look at ShingleFilter:  http://lucene.apache.org/core/3_6_0/api/all/org/apache/lucene/analysis/shingle/ShingleFilter.html

On 21 Dec 2012, at 08:42, Xi Shen wrote:

> I have to use the white space and word delimiter to process the input
> first. I tried many combination, and it seems to me that it is inevitable
> the term will be split into two :(
> 
> I think developing my own filter is the only resolution...but I just cannot
> find a guide to help me understand what I need to do to implement a
> TokenFilter.
> 
> 
> On Fri, Dec 21, 2012 at 4:03 PM, Danil ŢORIN <torindan@gmail.com> wrote:
> 
>> Easiest way would be to pre-process your input and join those 2 tokens
>> before splitting them by white space.
>> 
>> But from given context I might miss some details...still worth a shot.
>> 
>> On Fri, Dec 21, 2012 at 9:50 AM, Xi Shen <davidshen84@gmail.com> wrote:
>> 
>>> Hi,
>>> 
>>> I am looking for a token filter that can combine 2 terms into 1? E.g.
>>> 
>>> the input has been tokenized by white space:
>>> 
>>> t1 t2 t2a t3
>>> 
>>> I want a filter that output:
>>> 
>>> t1 t2t2a t3
>>> 
>>> I know it is a very special case, and I am thinking about develop a
>> filter
>>> of my own. But I cannot figure out which API I should use to look for
>> terms
>>> in a Token Stream.
>>> 
>>> --
>>> Regards,
>>> David Shen
>>> 
>>> http://about.me/davidshen
>>> https://twitter.com/#!/davidshen84
>>> 
>> 
> 
> 
> 
> -- 
> Regards,
> David Shen
> 
> http://about.me/davidshen
> https://twitter.com/#!/davidshen84


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message