lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From starz10de <>
Subject Re: storing the contents of a document in the lucene index
Date Wed, 23 Jul 2008 07:42:50 GMT

Hi Erik,

 I don't remove the stop words, as I index parallel corpora which is used
for learning the translations between pair of languages. so every word is
important. I even develop my own analyzer for Arabic which is just remove
punctuations and special symbols and it return only Arabic text.

I guess in the   the whole text is already stored

doc.add(Field.Text("contents", IN)); 

where IN is 

IN = new BufferedReader(new InputStreamReader(new FileInputStream(f))

if this is not the case yould you please how to store the whole text inside
the index ? 

I am new to lucene and I don't know how to use this "Field.Store.YES" to
store whole text.


Best regards

starz10de wrote:
>   Could any one tell me please how to print the content of the document
> after reading the index.
> for example if i like to print the  index terms then i do :
> IndexReader ir =;
> TermEnum termEnum = ir.terms(); 
> while ( {
> 			TermDocs dok = ir.termDocs();
> 			while ( {
> System.out.println(termEnum.term().text().trim());
> 				}
> I can print the text files before indexing them, but because of encoding
> issues i like to print them from the index.
> As i know the content of the document(whole text) is also stored in the
> index, my question how to print this content.
> so at the end i will print the path of the current document , index terms
> and the content of the document
> thanks in advance

View this message in context:
Sent from the Lucene - Java Users mailing list archive at

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message