lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler" <...@thetaphi.de>
Subject RE: Can lucene index tokenized files?
Date Sun, 14 Sep 2014 22:11:53 GMT
Hi,

If you have the serialized tokens in a file, you can write a custom TokenStream that unserializes
them and feeds them to IndexWriter as a Field instance in a Document instance. Please read
the javadocs how to write your own TokenStream implementation and pass it using "new TextField(name,
yourTokenStream)".

Uwe

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de


> -----Original Message-----
> From: Sachin Kulkarni [mailto:kulksac@hawk.iit.edu]
> Sent: Sunday, September 14, 2014 10:06 PM
> To: java-user@lucene.apache.org
> Subject: Can lucene index tokenized files?
> 
> Hi,
> 
> I have a dataset which has files in the form of tokens where the original data
> has been tokenized, stemmed, stopworded.
> 
> Is it possible to skip the lucene analyzers and index this dataset in Lucene?
> 
> So far the dataset I have dealt with was raw and used Lucene's tokenization
> and stemming schemes.
> 
> Thank you.
> 
> Regards,
> Sachin


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message