lucene-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Xi Shen <davidshe...@gmail.com>
Subject Re: Which token filter can combine 2 terms into 1?
Date Fri, 21 Dec 2012 07:46:48 GMT
Hi Steve,

This is a language dependent case. Basically, I will use white space token
filter to process the input. But some of the inputs should be one term,
instead of split into 2 terms. I think am thinking developing a special
filter to fix these terms.


On Fri, Dec 21, 2012 at 3:34 PM, Steve Rowe <sarowe@gmail.com> wrote:

> Hi David,
>
> Not very many people read this mailing list - I suggest you switch to the
> java-user list - see <http://lucene.apache.org/core/discussion.html>.
>
> SingleFilter and CommonGramsFilter combine terms, though the conditions
> under which they do so don't appear to be the same as what you want.
>
> Why are only the second two terms combined?
>
> Steve
>
> On Dec 21, 2012, at 2:27 AM, Xi Shen <davidshen84@gmail.com> wrote:
>
> > Hi,
> >
> > I am looking for a token filter that can combine 2 terms into 1? E.g.
> >
> > the input has been tokenized by white space:
> >
> > t1 t2 t2a t3
> >
> > I want a filter that output:
> >
> > t1 t2t2a t3
> >
> > I know it is a very special case, and I am thinking about develop a
> filter
> > of my own. But I cannot figure out which API I should use to look for
> terms
> > in a Token Stream.
> >
> >
> > --
> > Regards,
> > David Shen
> >
> > http://about.me/davidshen
> > https://twitter.com/#!/davidshen84
>
>


-- 
Regards,
David Shen

http://about.me/davidshen
https://twitter.com/#!/davidshen84

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message