lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Erick Erickson" <erickerick...@gmail.com>
Subject Re: storing the contents of a document in the lucene index
Date Wed, 23 Jul 2008 00:54:34 GMT
<<<As i know the content of the document(whole text) is also stored in the
index, my question how to print this content.>>>

This not strictly true. For instance, stop words aren't even indexed.
Reconstructing a document from the index is very expensive
(see Luke for examples of how this is done).

You can get the text back verbatim if you store it in your index. See
Field.Store.YES (or Field.Store.COMPRESS). Storage is orthogonal
to indexing, so you can index the tokens in a field but not store them,
store them but not index them, or do both. Not storing and not indexing
is, I guess, theoretically possible but I sure can't see why you'd try it
<G>.

But if you store the field, you can get it back very easily with
Document.get("field").
Storing the fields will make your index larger, but shouldn't have a great
effect on your search times I don't think.

Best
Erick



On Tue, Jul 22, 2008 at 2:53 PM, starz10de <farag_ahmed@yahoo.com> wrote:

>
>  Could any one tell me please how to print the content of the document
> after
> reading the index.
> for example if i like to print the  index terms then i do :
>
> IndexReader ir = IndexReader.open(index);
> TermEnum termEnum = ir.terms();
> while (termEnum.next()) {
>                        TermDocs dok = ir.termDocs();
>                        dok.seek(termEnum);
>                        while (dok.next()) {
> System.out.println(termEnum.term().text().trim());
>                                }
>
> I can print the text files before indexing them, but because of encoding
> issues i like to print them from the index.
> As i know the content of the document(whole text) is also stored in the
> index, my question how to print this content.
>
> so at the end i will print the path of the current document , index terms
> and the content of the document
>
>
> thanks in advance
> --
> View this message in context:
> http://www.nabble.com/storing-the-contents-of-a-document-in-the--lucene-index-tp18595855p18595855.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message