lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ravikumar Govindarajan <ravikumar.govindara...@gmail.com>
Subject Re: URL/Email tokenizer
Date Tue, 17 Feb 2015 11:42:06 GMT
Thanks Ian

What I am currently doing is duplicating the data into 2 different fields
and having my own PerFieldAnalyzerWrapper just like you pointed out

Is there a good way to do this in a single-pass? Like how Bi-Grams or
Common-Grams do…

--
Ravi

On Tue, Feb 17, 2015 at 3:08 PM, Ian Lea <ian.lea@gmail.com> wrote:

> Sounds like a job for
> org.apache.lucene.analysis.miscellaneous.PerFieldAnalyzerWrapper.
>
>
> --
> Ian.
>
>
> On Tue, Feb 17, 2015 at 8:51 AM, Ravikumar Govindarajan
> <ravikumar.govindarajan@gmail.com> wrote:
> > We have a requirement in that E-mail addresses need to be added in a
> > tokenized form to one field while untokenized form is added to another
> field
> >
> > Ex:
> >
> > "I have mailed abc@xyz.com" . It should tokenize as below
> >
> > body = {"I", "have", "mailed", "abc", "xyz", "com"};
> >
> > I also have a body-addr field. Tokenizer needs to extract e-mail
> addresses
> > from body field and add them as below
> >
> > body-addr = {"abc@xyz.com"}
> >
> > How to achieve this via tokenizer chain?
> >
> > --
> > Ravi
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message