lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sachin Kulkarni <kulk...@hawk.iit.edu>
Subject Re: Can lucene index tokenized files?
Date Mon, 15 Sep 2014 03:34:27 GMT
Hi Uwe,

Thank you.
I do not have the tokens serialized, so that reduces one step.
I am reading the javadocs and will try it the way you mentioned.

Regards,
Sachin

On Sun, Sep 14, 2014 at 5:11 PM, Uwe Schindler <uwe@thetaphi.de> wrote:

> Hi,
>
> If you have the serialized tokens in a file, you can write a custom
> TokenStream that unserializes them and feeds them to IndexWriter as a Field
> instance in a Document instance. Please read the javadocs how to write your
> own TokenStream implementation and pass it using "new TextField(name,
> yourTokenStream)".
>
> Uwe
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe@thetaphi.de
>
>
> > -----Original Message-----
> > From: Sachin Kulkarni [mailto:kulksac@hawk.iit.edu]
> > Sent: Sunday, September 14, 2014 10:06 PM
> > To: java-user@lucene.apache.org
> > Subject: Can lucene index tokenized files?
> >
> > Hi,
> >
> > I have a dataset which has files in the form of tokens where the
> original data
> > has been tokenized, stemmed, stopworded.
> >
> > Is it possible to skip the lucene analyzers and index this dataset in
> Lucene?
> >
> > So far the dataset I have dealt with was raw and used Lucene's
> tokenization
> > and stemming schemes.
> >
> > Thank you.
> >
> > Regards,
> > Sachin
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message