lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From starz10de <farag_ah...@yahoo.com>
Subject Re: storing the contents of a document in the lucene index
Date Thu, 24 Jul 2008 13:25:39 GMT

Dear Erick ,

 Thnaks for your answer, I tryed other way ,  where I read the text files
before i index them. I will try also your solution here.

best regards


Erick Erickson wrote:
> 
> OK, I'm finally catching on. You have to change the demo code to
> get the contents into something besides an input stream, so you
> can use one of the alternate forms of the Field constructor. For
> instance, you could read it all into a string and use the form:
> 
> doc.add(new Field("content", <string with all the file contents in it>,
>                Field.Store.YES, Field.Index.TOKENIZED))
> 
> 
> Or, you can do something like this, which produces identical results
> to the above
> 
> while (more text to read) {
>      String line = read a line of text from the file
>      doc.add(new Field("content", line, Field.Store.YES,
> Field.Index.TOKENIZED))
> }
> 
> You can add to the same field as often as you want and it just appends the
> content of calls 2 to N to the same field.
> 
> 
> Best
> Erick
> 
> 
> On Wed, Jul 23, 2008 at 3:42 AM, starz10de <farag_ahmed@yahoo.com> wrote:
> 
>>
>> Hi Erik,
>>
>>  I don't remove the stop words, as I index parallel corpora which is used
>> for learning the translations between pair of languages. so every word is
>> important. I even develop my own analyzer for Arabic which is just remove
>> punctuations and special symbols and it return only Arabic text.
>>
>> I guess in the   FileDocument.java   the whole text is already stored
>>
>> doc.add(Field.Text("contents", IN));
>>
>> where IN is
>>
>> IN = new BufferedReader(new InputStreamReader(new FileInputStream(f))
>>
>> if this is not the case yould you please how to store the whole text
>> inside
>> the index ?
>>
>> I am new to lucene and I don't know how to use this "Field.Store.YES" to
>> store whole text.
>>
>>
>>
>> Best regards
>> Farag
>>
>>
>>
>> starz10de wrote:
>> >
>> >   Could any one tell me please how to print the content of the document
>> > after reading the index.
>> > for example if i like to print the  index terms then i do :
>> >
>> > IndexReader ir = IndexReader.open(index);
>> > TermEnum termEnum = ir.terms();
>> > while (termEnum.next()) {
>> >                       TermDocs dok = ir.termDocs();
>> >                       dok.seek(termEnum);
>> >                       while (dok.next()) {
>> > System.out.println(termEnum.term().text().trim());
>> >                               }
>> >
>> > I can print the text files before indexing them, but because of
>> encoding
>> > issues i like to print them from the index.
>> > As i know the content of the document(whole text) is also stored in the
>> > index, my question how to print this content.
>> >
>> > so at the end i will print the path of the current document , index
>> terms
>> > and the content of the document
>> >
>> >
>> > thanks in advance
>> >
>>
>> --
>> View this message in context:
>> http://www.nabble.com/storing-the-contents-of-a-document-in-the--lucene-index-tp18595855p18605547.html
>> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
> 
> 

-- 
View this message in context: http://www.nabble.com/storing-the-contents-of-a-document-in-the--lucene-index-tp18595855p18631887.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message