lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler" <...@thetaphi.de>
Subject RE: storing pre-analyzed fields
Date Wed, 11 Jul 2012 05:14:42 GMT
Hi Mike,

The order does not matter at all in all versions of Lucene. You also don't
need to subclass AbstractField (but you can use e.g. NumericField as an
example); it is enough to use new Field(name, TokenStream); if you also want
to store this field, simply add a stored-only field with the *same* name (in
addition to the TokenStream one).

In Lucene 4.0 we are going the direction to split between the "Document"
objects using for indexing from them returned by IndexReader/Searcher,
because they are two different things and the latter only returning stored
fields. But this does not affect anything here.

In all Lucene versions, stored field values and indexed values are
completely decoupled and do not relate to each other at all. Adding a Field
in stored+indexed way is just for convenience, but you can also add it two
times (one time as stored, one time as indexed - I prefer to always do this)
in any order. The resulting index will be identical (don't compare files;
there will be differences in headers!).

There is one importance of order: Fields with the same name and same type
rely on order, so two stored fields with same name are returned in same
order by IndexReader/-Searcher, and 2 indexed fields with same name produce
the same order for e.g. PhraseQuery or SpanQuery only, if the Field order is
predefined. But you can interleave the Field instances for each type as you
like.

Uwe

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de


> -----Original Message-----
> From: Michael Sokolov [mailto:sokolov@ifactory.com]
> Sent: Wednesday, July 11, 2012 2:54 AM
> To: java-user@lucene.apache.org
> Subject: storing pre-analyzed fields
> 
> I have a question about the API for storing and indexing lucene documents
(in
> 3.x).
> 
> If I want to index a document by providing a TokenStream, I can do that by
> calling document.add (field) where field is something I write deriving
from
> AbstractField that returns the TokenStream for tokenStreamValue(), and
> nothing for stringValue() or readerValue().
> 
> Now if I also want to store a value for that field, do I just add a
different field
> with different options (eg stored=true, and the field a normal Field)?
> 
> Do these two things conflict in any way?  Do I have to be careful about
the
> order in which I do them?  Or is it just a mildly weird API with no
lurking ill
> effects? :)
> 
> Also: I have been seeing various e-mails about changes to this API so I
assume
> it's all different in 4.0; if you want to take this opportunity to explain
that,
> please go ahead, but for now I am working with the 3.x API.
> 
> Thanks
> 
> -Mike Sokolov
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message