lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Sokolov <>
Subject Re: storing pre-analyzed fields
Date Wed, 11 Jul 2012 10:38:15 GMT
Uwe  - thank you very much for the thorough explanation!


On 7/11/2012 1:14 AM, Uwe Schindler wrote:
> Hi Mike,
> The order does not matter at all in all versions of Lucene. You also don't
> need to subclass AbstractField (but you can use e.g. NumericField as an
> example); it is enough to use new Field(name, TokenStream); if you also want
> to store this field, simply add a stored-only field with the *same* name (in
> addition to the TokenStream one).
> In Lucene 4.0 we are going the direction to split between the "Document"
> objects using for indexing from them returned by IndexReader/Searcher,
> because they are two different things and the latter only returning stored
> fields. But this does not affect anything here.
> In all Lucene versions, stored field values and indexed values are
> completely decoupled and do not relate to each other at all. Adding a Field
> in stored+indexed way is just for convenience, but you can also add it two
> times (one time as stored, one time as indexed - I prefer to always do this)
> in any order. The resulting index will be identical (don't compare files;
> there will be differences in headers!).
> There is one importance of order: Fields with the same name and same type
> rely on order, so two stored fields with same name are returned in same
> order by IndexReader/-Searcher, and 2 indexed fields with same name produce
> the same order for e.g. PhraseQuery or SpanQuery only, if the Field order is
> predefined. But you can interleave the Field instances for each type as you
> like.
> Uwe
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> eMail:
>> -----Original Message-----
>> From: Michael Sokolov []
>> Sent: Wednesday, July 11, 2012 2:54 AM
>> To:
>> Subject: storing pre-analyzed fields
>> I have a question about the API for storing and indexing lucene documents
> (in
>> 3.x).
>> If I want to index a document by providing a TokenStream, I can do that by
>> calling document.add (field) where field is something I write deriving
> from
>> AbstractField that returns the TokenStream for tokenStreamValue(), and
>> nothing for stringValue() or readerValue().
>> Now if I also want to store a value for that field, do I just add a
> different field
>> with different options (eg stored=true, and the field a normal Field)?
>> Do these two things conflict in any way?  Do I have to be careful about
> the
>> order in which I do them?  Or is it just a mildly weird API with no
> lurking ill
>> effects? :)
>> Also: I have been seeing various e-mails about changes to this API so I
> assume
>> it's all different in 4.0; if you want to take this opportunity to explain
> that,
>> please go ahead, but for now I am working with the 3.x API.
>> Thanks
>> -Mike Sokolov
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail:
>> For additional commands, e-mail:
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message