lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Carsten Schnober <schno...@ids-mannheim.de>
Subject Field value vs TokenStream
Date Wed, 18 Apr 2012 15:00:03 GMT
Dear list,
I'm studying the Lucene index file formats and I wonder: after having
initialized a field with Field(String name, String value, Field.Store
store, Field.Index index), where is the value String stored?

I understand that the chosen analyzer does its processing on that value,
including tokenization, and returns a TokenStream from which the Indexer
retrieves the attributes that it stores in the index.
When I use a binary editor to inspect the term infos (tis) file in the
index directory, I can see every single token (term).
For experimenting purposes, I implemented an analyzer that converts the
value input to the field and noticed the following: the TokenStream
still correctly generates the terms that end up to be stored in the tis
file, but the initial input value is still displayed as the field value
when I retrieve a document from the index and output it with
Document.toString(). I tried to analyse the Field's tokenStream, but
tokenStreamValue() returns null; is that normal when retrieving a
document from an existing index?

Can someone let me know what happens to a Field's value string and at
which point in the pipeline it is replaced by the (term) attributes
generated by the TokenStream?

Thank you very much!
Best,
Carsten


-- 
Carsten Schnober
Institut für Deutsche Sprache | http://www.ids-mannheim.de
Projekt KorAP -- Korpusanalyseplattform der nächsten Generation
http://korap.ids-mannheim.de/ | Tel.: +49-(0)621-1581-238

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message