lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From starz10de <farag_ah...@yahoo.com>
Subject Re: storing the contents of a document in the lucene index
Date Wed, 23 Jul 2008 07:42:50 GMT

Hi Erik,

 I don't remove the stop words, as I index parallel corpora which is used
for learning the translations between pair of languages. so every word is
important. I even develop my own analyzer for Arabic which is just remove
punctuations and special symbols and it return only Arabic text.

I guess in the   FileDocument.java   the whole text is already stored

doc.add(Field.Text("contents", IN)); 

where IN is 

IN = new BufferedReader(new InputStreamReader(new FileInputStream(f))

if this is not the case yould you please how to store the whole text inside
the index ? 

I am new to lucene and I don't know how to use this "Field.Store.YES" to
store whole text.

 

Best regards
Farag



starz10de wrote:
> 
>   Could any one tell me please how to print the content of the document
> after reading the index.
> for example if i like to print the  index terms then i do :
> 
> IndexReader ir = IndexReader.open(index);
> TermEnum termEnum = ir.terms(); 
> while (termEnum.next()) {
> 			TermDocs dok = ir.termDocs();
> 			dok.seek(termEnum);
> 			while (dok.next()) {
> System.out.println(termEnum.term().text().trim());
> 				}
> 
> I can print the text files before indexing them, but because of encoding
> issues i like to print them from the index.
> As i know the content of the document(whole text) is also stored in the
> index, my question how to print this content.
> 
> so at the end i will print the path of the current document , index terms
> and the content of the document
> 
> 
> thanks in advance
> 

-- 
View this message in context: http://www.nabble.com/storing-the-contents-of-a-document-in-the--lucene-index-tp18595855p18605547.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message