lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steve Rowe <sar...@gmail.com>
Subject Proposal/request for comments: Solr schema annotation
Date Thu, 01 Aug 2013 01:21:59 GMT
In thinking about making the entire Solr schema REST-API-addressable (SOLR-4898), I'd like
to be able to add arbitrary metadata at both the top level of the schema and at each leaf
node, and allow read/write access to that metadata via the REST API.

Some uses I've thought of for such a facility: 

1. The managed schema now drops XML comments from schema.xml upon conversion to managed-schema
format, but it would be much better if these were somehow preserved, as well as round-trippable
when retrieving the schema and its constituents via the REST API.

2. Some comments in the example schemas don't refer to just one or to all leaf nodes, but
rather to a group of them. I'd like to be able to group nodes by adding same-named "tags"
to multiple nodes, and also have a top-level (optional) "tag description" - this description
could then be presented with tagged nodes in various output formats.

3. Some comments in the example schema are documentation about a feature, e.g. copyFields.
 A top-level "documentation" annotation could take a leaf node element name (or maybe an XPath?
probably overkill) and apply to all matching elements. 

4. When modifying the schema via REST API, a "last-modified" annotation could be automatically
added.

5. There were a couple of user complaints recently when schema.xml parsing was tightened to
disallow unknown attributes on field declarations (SOLR-4641): people were storing their own
information there.  User-level metadata would support this in a round-trippable way - I'm
thinking we could restrict it to flat string-typed key/value pairs, with no nested structure.

W3C XML Schema has a similar facility: <http://www.w3.org/TR/2004/REC-xmlschema-1-20041028/structures.html#element-annotation>.

Thoughts?

Some concrete examples of what I'm thinking of in schema.xml format (syntax/naming as yet
unsettled):

<schema name="example" version="1.5">
  <annotation>
    <description element="tag" content="plain-numeric-field-types">
      Plain numeric field types store and index the text value verbatim.
    </description>
    <documentation element="copyField">
      copyField commands copy one field to another at the time a document
      is added to the index.  It's used either to index the same field differently,
      or to add multiple fields to the same field for easier/faster searching.
    </documentation>
    <last-modified>2014-03-08T12:14:02Z</last-modified>
    …
  </annotation>
…
  <fieldType name="pint" class="solr.IntField">
    <annotation>
      <tag>plain-numeric-field-types</tag>
    </annotation>
  </fieldType>
  <fieldType name="plong" class="solr.LongField">
    <annotation>
      <tag>plain-numeric-field-types</tag>
    </annotation>
  </fieldType>
  …
  <copyField source="cat" dest="text">
    <annotation>
      <todo>Should this field really be copied to the catchall text field?</todo>
    </annotation>
  </copyField>
  …
  <field name="text" type="text_general">
    <annotation>
      <description>catchall field</description>
      <visibility>public</visibility>
    </annotation>
  </field>


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message