lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Xi Shen <davidshe...@gmail.com>
Subject Re: Which token filter can combine 2 terms into 1?
Date Fri, 21 Dec 2012 09:16:08 GMT
Unfortunately, no...I am not combine every two term into one. I am
combining a specific pair.

E.g. the Token Stream: t1 t2 t2a t3
should be rewritten into t1 t2t2a t3

But the TS: t1 t2 t3 t2a
should not be rewritten, and it is already correct


On Fri, Dec 21, 2012 at 5:00 PM, Alan Woodward <
alan.woodward@romseysoftware.co.uk> wrote:

> Have a look at ShingleFilter:
> http://lucene.apache.org/core/3_6_0/api/all/org/apache/lucene/analysis/shingle/ShingleFilter.html
>
> On 21 Dec 2012, at 08:42, Xi Shen wrote:
>
> > I have to use the white space and word delimiter to process the input
> > first. I tried many combination, and it seems to me that it is inevitable
> > the term will be split into two :(
> >
> > I think developing my own filter is the only resolution...but I just
> cannot
> > find a guide to help me understand what I need to do to implement a
> > TokenFilter.
> >
> >
> > On Fri, Dec 21, 2012 at 4:03 PM, Danil ŢORIN <torindan@gmail.com> wrote:
> >
> >> Easiest way would be to pre-process your input and join those 2 tokens
> >> before splitting them by white space.
> >>
> >> But from given context I might miss some details...still worth a shot.
> >>
> >> On Fri, Dec 21, 2012 at 9:50 AM, Xi Shen <davidshen84@gmail.com> wrote:
> >>
> >>> Hi,
> >>>
> >>> I am looking for a token filter that can combine 2 terms into 1? E.g.
> >>>
> >>> the input has been tokenized by white space:
> >>>
> >>> t1 t2 t2a t3
> >>>
> >>> I want a filter that output:
> >>>
> >>> t1 t2t2a t3
> >>>
> >>> I know it is a very special case, and I am thinking about develop a
> >> filter
> >>> of my own. But I cannot figure out which API I should use to look for
> >> terms
> >>> in a Token Stream.
> >>>
> >>> --
> >>> Regards,
> >>> David Shen
> >>>
> >>> http://about.me/davidshen
> >>> https://twitter.com/#!/davidshen84
> >>>
> >>
> >
> >
> >
> > --
> > Regards,
> > David Shen
> >
> > http://about.me/davidshen
> > https://twitter.com/#!/davidshen84
>
>


-- 
Regards,
David Shen

http://about.me/davidshen
https://twitter.com/#!/davidshen84

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message