lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Xi Shen <davidshe...@gmail.com>
Subject Re: Which token filter can combine 2 terms into 1?
Date Fri, 21 Dec 2012 08:42:38 GMT
I have to use the white space and word delimiter to process the input
first. I tried many combination, and it seems to me that it is inevitable
the term will be split into two :(

I think developing my own filter is the only resolution...but I just cannot
find a guide to help me understand what I need to do to implement a
TokenFilter.


On Fri, Dec 21, 2012 at 4:03 PM, Danil ŢORIN <torindan@gmail.com> wrote:

> Easiest way would be to pre-process your input and join those 2 tokens
> before splitting them by white space.
>
> But from given context I might miss some details...still worth a shot.
>
> On Fri, Dec 21, 2012 at 9:50 AM, Xi Shen <davidshen84@gmail.com> wrote:
>
> > Hi,
> >
> > I am looking for a token filter that can combine 2 terms into 1? E.g.
> >
> > the input has been tokenized by white space:
> >
> > t1 t2 t2a t3
> >
> > I want a filter that output:
> >
> > t1 t2t2a t3
> >
> > I know it is a very special case, and I am thinking about develop a
> filter
> > of my own. But I cannot figure out which API I should use to look for
> terms
> > in a Token Stream.
> >
> > --
> > Regards,
> > David Shen
> >
> > http://about.me/davidshen
> > https://twitter.com/#!/davidshen84
> >
>



-- 
Regards,
David Shen

http://about.me/davidshen
https://twitter.com/#!/davidshen84

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message