lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Carsten Schnober <>
Subject Field value vs TokenStream
Date Wed, 18 Apr 2012 15:00:03 GMT
Dear list,
I'm studying the Lucene index file formats and I wonder: after having
initialized a field with Field(String name, String value, Field.Store
store, Field.Index index), where is the value String stored?

I understand that the chosen analyzer does its processing on that value,
including tokenization, and returns a TokenStream from which the Indexer
retrieves the attributes that it stores in the index.
When I use a binary editor to inspect the term infos (tis) file in the
index directory, I can see every single token (term).
For experimenting purposes, I implemented an analyzer that converts the
value input to the field and noticed the following: the TokenStream
still correctly generates the terms that end up to be stored in the tis
file, but the initial input value is still displayed as the field value
when I retrieve a document from the index and output it with
Document.toString(). I tried to analyse the Field's tokenStream, but
tokenStreamValue() returns null; is that normal when retrieving a
document from an existing index?

Can someone let me know what happens to a Field's value string and at
which point in the pipeline it is replaced by the (term) attributes
generated by the TokenStream?

Thank you very much!

Carsten Schnober
Institut für Deutsche Sprache |
Projekt KorAP -- Korpusanalyseplattform der nächsten Generation | Tel.: +49-(0)621-1581-238

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message