lucene-solr-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ryan McKinley" <>
Subject Re: loading many documents by ID
Date Fri, 02 Feb 2007 11:15:37 GMT
> 1) regardless of the verb (updatable/modifiable) i'm not sure that it
> makes sense to annotate in the schema the fields that should be copied on
> update, and not label the feilds that must be "set" on update (ie: the
> fields that cannot be copied)

I agree.  I started down that path, and it gets pretty ugly.  I
stopped.  I have opted for a syntax that 'updates' all stored fields,
but lets you say explicitly what to do for each field.  If there is a
stored field you want to skip, you can specify that in command rather
then in the schema.

> another simple approach would be to make "updatability" a property of the
> schema, that can contain a few different values...
>  "strict" - indexed and stored are no longer valid field(type)
>             attributes -- all fields are indexed and stored. all fields
>             are copied on "update" unless the update command inlcudes
>             instructions to replace, append or incriment the field value
>   "loose" - indexed/stored still exist, any attempt to "update" an
>             existing document is legal, all stored fields are copied
>             on update unless the update command includes in structures
>             to replace, append or increment the field value.
>    "none" - any attempt to update will fail.

This is an interisting idea, but (if i'm understaning your suggestion
correctly) it seems like TOO big of change from the existing schema.

The more I think about the 'error' behavior, I am convinced we just
need solid, easily explainable logic for what happens and why.  I
think throwing an error if there are no stored fields is reasonable
and only updating stored fields is simple enough logic I don't think
we need to over complicate it.

> another approach i don't really have fully fleshed out in my head would be
> to introduce a concept of "fieldsets" ... an update that
> sets/appends/incrments a field in a fieldset which does not provide a

I may be working on this, but not sure if it is what you are saying.  I have:

public class IndexDocumentCommand
  public enum FieldMODE {
    APPEND,    // add the fields to existing fields
    OVERWRITE, // overwrite existing fields
    INCREMENT, // increment existing field.  Must be a number!
    DISTINCT,  // same as APPEND, but make sure there are distinct values
    IGNORE     // ignore the previous value -- don't copy it

  public Iterable<SolrDocument> docs;
  public Map<String,FieldMODE> fieldMode; // What to do for each field.
  public int commitMaxTime = -1;

If fieldMode is null or they are all OVERWRITE, the addDoc command
behaves as it always has.  Otherwise, it first extracts the exiting
stored values (unless the fieldMode is IGNORE) then applies the new
documents value on top of the old one.

Currently I am only handling wildcard substitution for "*" - the
default mode.  I have not tried to tackle dynamic fields yet...  it
seems a bit more complicated!

View raw message