lucene-solr-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Hostetter <hossman_luc...@fucit.org>
Subject Re: loading many documents by ID
Date Sat, 03 Feb 2007 06:11:51 GMT

: I agree.  I started down that path, and it gets pretty ugly.  I
: stopped.  I have opted for a syntax that 'updates' all stored fields,
: but lets you say explicitly what to do for each field.  If there is a
: stored field you want to skip, you can specify that in command rather
: then in the schema.

the schema creator should still have some say in what kinds of things are
allowed/dissalloed though -- the person doing the "update" may not fully
understand the underlying model.

: > another simple approach would be to make "updatability" a property of the
: > schema, that can contain a few different values...

: This is an interisting idea, but (if i'm understaning your suggestion
: correctly) it seems like TOO big of change from the existing schema.

the schema.xml format wouldn't change much .. just a new attribute on the
<schema> tag ... the existing example schema would either be labeled
"loose"  or "none" and we could provide another example of "strict" ... or
we would label it "strict" and remove the refrences to indexed/stored and
only mention them in comments describing other things you can do if you
dont' require the ability to mutate documents.

: think throwing an error if there are no stored fields is reasonable
: and only updating stored fields is simple enough logic I don't think
: we need to over complicate it.

throwing an error if there are no stored fields in the schema, or no
stored fields in the existing document, or no stored fields in mutate
request?

what if the document just doesn't have any stored fields because the first
time it was added, the stored fields weren't known yet?

what if the document does have stored fields, but it also has an indexed
but not stored fields, and the person doing the update doesn't realize
htat and doesn't send a replacement value for that field?

: > another approach i don't really have fully fleshed out in my head would be
: > to introduce a concept of "fieldsets" ... an update that
: > sets/appends/incrments a field in a fieldset which does not provide a
:
: I may be working on this, but not sure if it is what you are saying.  I have:

no, i was thining of it as a new bit of syntax in the schema ... after
defining all of your <field>s you have some <fieldset>s and any time you
update a doc, and mutate a field (either overwrite, append,
increment, whatever) which is in some <fieldset>s then you have to also
provide a new value for any non-stored field also listed in those
fieldset.

in a simpel schema, you'd only need one <fieldset> and it would list every
field (we'd probably even want a simple syntactic alias for that) but in
more complex schemas where you want SOlr to provide some sanity checking
on your docs, but you frequently have different "types" of docs in your
schema with differnet sets of common overlapping fields - the <fieldset>s
are your way of telling Solr when to complain.

:   public enum FieldMODE {
:     APPEND,    // add the fields to existing fields
:     OVERWRITE, // overwrite existing fields
:     INCREMENT, // increment existing field.  Must be a number!
:     DISTINCT,  // same as APPEND, but make sure there are distinct values
:     IGNORE     // ignore the previous value -- don't copy it

as i understand it, these are options specified by the client triggering
the "mutate doc" command right? ... they totally make sense, but they
don't really address what Sol should do if the command doesn't mention a
field which is in the schema.

the use case i'm thinking about is an existing solr index with lots of
clients from differnet parts of a company adding/mutating documents, and
then the schema needs changed.  the Schema Owner should have some way of
saying what happens if one of those clients attempts to mutate a document
and doesn't provide a replacement value for an indexed/unstored field --
but there's no easy/fast way for the UpdateHandler to realize that a given
document has indexed values for that field -- hence either some simple
broad rules the schema owner can put in about hte schema as a whole, or
sets of fields the schema owner can define: (if they try to mutate x, y,
or z, then they better be providing a, b and c because they are all used
together)

: default mode.  I have not tried to tackle dynamic fields yet...  it
: seems a bit more complicated!

yeah .. that's what i'm worried about with the fieldset idea too.

It's one of the reasons why it might be a good idea to just say:

  * if you want to be able to mutate docs, and you want to be garunteed it
will allways work, then every indexed field must be stored.

  * if you want to be able to mutate docs, and you can't feasible store
every indexed field; then add this one line to your schema.xml and Solr
will trust that the clients sending mutate requests know what they are
doing.

  * if you don't trust your clients to know what they are doing when
mutating documents, add this one line to your schema and Solr will reject
any attempt to mutate a document (only wholesale document replacement will
be allowed)



-Hoss


Mime
View raw message