lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael McCandless (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LUCENE-3312) Break out StorableField from IndexableField
Date Mon, 29 Aug 2011 16:14:38 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-3312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13092948#comment-13092948
] 

Michael McCandless commented on LUCENE-3312:
--------------------------------------------

bq. Due to the fact that FieldInfo is maintained per field name, if an IndexableField and
StorableField are added to a Document separately but with the same name, a single FieldInfo
will be created noting the field is both indexed and stored. This isn't a problem, however
a lot of code used to leverage this fact to get metadata about indexed Fields using searcher.document(docId).
They would retrieve all the stored fields and then see which were also indexed (and associated
metadata). This seems like a bit of a hack, piggybacking stored fields to find out about their
indexing attributes. So I guess it cannot continue to go forward? When you pull the StorableFields,
you should only be able to access the stored value metadata?

Right, this has been a long standing problem w/ the Document class you
load at search time, ie the fields "pretend" to carry over the
details from indexing.  But it's buggy now, eg boost is not carried
over, and the indexed bit is "global" (comes from field info) while
the "tokenized" bit used to be per-doc, before LUCENE-2308.

So I consider this (these indexing details are no longer available
when you pull the document) a big benefit of cutting over to
StorableField.  Ie, its trappy today since it's buggy, so we'd be
removing that trap.

bq. By creating this separation, we will need some notion of a Document in index.* which provides
Iterable access to both the IndexableFields and StorableFields. As such, Document itself is
becoming more userland. However by letting it store Indexable and StorableFields separately,
the functionality it provides (getBinaryValue for example) becomes quite verbose because it
must provide an implementations of both kinds of fields. Given that Field is a userland implementation
of both Indexable and StorableField, should Document work solely with Fields? or should we
allow people to register both kinds of fields separately and just have a verbose set of functionality?

Good question... I think the userland "Field" (oal.document) should
implement both IndexableField and StorableField?  And then
oal.document.Document holds Field instances?

Maybe we can name the new class oal.index.Indexable?  It's a trivial
class, just exposing .indexableFieldsIterator and
.storableFieldsIterator?


> Break out StorableField from IndexableField
> -------------------------------------------
>
>                 Key: LUCENE-3312
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3312
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: core/index
>            Reporter: Michael McCandless
>             Fix For: Field Type branch
>
>
> In the field type branch we have strongly decoupled
> Document/Field/FieldType impl from the indexer, by having only a
> narrow API (IndexableField) passed to IndexWriter.  This frees apps up
> use their own "documents" instead of the "user-space" impls we provide
> in oal.document.
> Similarly, with LUCENE-3309, we've done the same thing on the
> doc/field retrieval side (from IndexReader), with the
> StoredFieldsVisitor.
> But, maybe we should break out StorableField from IndexableField,
> such that when you index a doc you provide two Iterables -- one for the
> IndexableFields and one for the StorableFields.  Either can be null.
> One downside is possible perf hit for fields that are both indexed &
> stored (ie, we visit them twice, lookup their name in a hash twice,
> etc.).  But the upside is a cleaner separation of concerns in API....

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message