lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chuck Williams <ch...@manawiz.com>
Subject Re: Global field semantics
Date Mon, 10 Jul 2006 15:46:12 GMT
Chris Hostetter wrote on 07/10/2006 02:06 AM:
> As near as i can tell, the large issue can be sumarized with the following
> sentiment:
>
> 	Performance gains could be realized if Field
> 	properties were made fixed and homogeneous for
> 	all Documents in an index.
>   

This is certainly a large issue, as David says he has achieved a 5x
performance gain.

My interest in global field semantics originally sprang from
functionality considerations, not performance considerations.  I've got
many features that require reasoning about field semantics.  I
previously mentioned a very simple one:  validating fields in the query
parser.  More interesting examples are:

  1.  Multiple inheritance on the fields of documents that record the
sources of each inherited value to support efficient incremental maintenance
  2.  "Record-valued fields" that store facets with values (e.g., time
and user information for who set that value).  These cannot easily be
broken into multiple fields because the fields in question are multi-valued.
  3.  "Join fields" that reference id's of objects stored in separate
indices (supporting queries that reference the fields in the joined index)

Managing these kinds of rich semantic features in query parsing and
indexing is greatly facilitated by a global field model.  I've built
this into my app, and then started thinking about benefits in Lucene
generally from such a model.

>   1) all Fields and their properties must be predeclared before any
>      document is ever added to the index, and any Field not declared is
>      illegal.
>   2) a Field springs into existence the first time a Document is added
>      with a value for it -- but after that all newly added Documents with
>      a value for that field must conform to the Field properites initially
>      used.
>
> (have I missed any general approaches?)
>   

Yes.  Here is (an elaboration of) the "global model with exceptions"
idea we reached:

    3) There is a global field model in Lucene that contains the list of
all known fields and their "default semantics".  The class that contains
this model supports a number of implicit and explicit methods to
construct and query the model.  The model can be evolved.  The model is
used many places in Lucene, in some cases according to
application-settable properties.  E.g.:
        a) Creating a Field uses the properties of the model so they
need not be specified at each construction.  A global model property
determines whether or not field properties may be overridden, and
whether or not fields may be created that are not in the model (in which
case, they are automatically added to the model).
        b) The query parser has hooks that affect Query generation based
on the model properties of the field (not just for certain special query
types like Term's and RangeQuery's).  The application can easily provide
methods to implement these hooks.  This is essential for features like
2&3 above (and beneficial for 1).

> How would something like this work?
>
>   docA.add(new Field(f, "bar", Store.YES, Index.UN_TOKENIZED)):
>   docA.add(new Field(f, "foo", Store.NO,  Index.TOKENIZED)):
>
>   docB.add(new Field(f, "x y", Store.YES, Index.TOKENIZED)):
>   docB.add(new Field(f, "z",   Store.NO,  Index.UN_TOKENIZED)):
>   

The application could determine whether or not this kind of operation
was supported accorded to the global enforcement properties of the
model.  If this is needed, the ability to have exceptions at the Field
level would permit it.

Hoss, do you have a use case requiring Store and Index variance like this?

The impact of this flexibility on David's 5x is another question...

Chuck


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message