lucene-solr-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mattmann, Chris A (388J)" <chris.a.mattm...@jpl.nasa.gov>
Subject SOLR-1131: disconnect between fields created by poly fields
Date Thu, 10 Dec 2009 23:35:52 GMT
Hi All,

While thinking about SOLR-1131, something important just came to mind. If we
allow poly fields to add fields to the schema (be it via dynamic fields, or
explicit field decls, either way), then we introduce a disconnect between
the existing XML schema, and the runtime schema instance. To my knowledge
there is no write-back/flush back for changes made to the schema at
run-time, and what's loaded on startup (correct me if I'm wrong).

Flushing back changes is probably _not_ a good idea for (at least) the
following reasons:

* someone may have put comments into the schema.xml file that helps them to
understand the fields/etc. in the file -- the flush back operation would
then have to deal with these, and dealing with formatting, etc., would be a
hassle.

* there may be existing tools (e.g., Schema editors, etc.) that may be
developed, and allowing a SOLR instance to modify a schema file and flush it
back may interfere with someone using an editor on their schema (though
doing so at runtime whether you're using a tool or vi is not a good idea
either way)

That said, I'm wondering what the right compromise is here? My instinct says
that PolyFields should _only_ be able to deal with fields that have already
been recorded in the schema.xml file (yes, this is limiting). That's the
only way to truly keep the schema and its run-time instance in sync, while
preserving all the manual curation activities that have occurred in the
schema and not stepping on any toes. Also, I'm (ack!) now leaning towards
dynamic fields as a flexible method to do this (so long as they are
pre-declared in the schema.xml file explicitly) -- that way you don't have
to create (n+10) * m fields for a 10-dimensional point that's stored as well
as indexed.

So based, on this, and based on my knowledge of the existing patches, if we
added a check that ensured that the dynamic field(s) already be declared in
the schema at startup, then I think all is well.

Thoughts?

Cheers,
Chris


++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: Chris.Mattmann@jpl.nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Assistant Professor, Computer Science Department University of
Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++



Mime
View raw message