lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler" <>
Subject RE: Can lucene index tokenized files?
Date Sun, 14 Sep 2014 22:11:53 GMT

If you have the serialized tokens in a file, you can write a custom TokenStream that unserializes
them and feeds them to IndexWriter as a Field instance in a Document instance. Please read
the javadocs how to write your own TokenStream implementation and pass it using "new TextField(name,


Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen

> -----Original Message-----
> From: Sachin Kulkarni []
> Sent: Sunday, September 14, 2014 10:06 PM
> To:
> Subject: Can lucene index tokenized files?
> Hi,
> I have a dataset which has files in the form of tokens where the original data
> has been tokenized, stemmed, stopworded.
> Is it possible to skip the lucene analyzers and index this dataset in Lucene?
> So far the dataset I have dealt with was raw and used Lucene's tokenization
> and stemming schemes.
> Thank you.
> Regards,
> Sachin

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message