lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Walter Underwood <wun...@wunderwood.org>
Subject Re: Proposal/request for comments: Solr schema annotation
Date Thu, 01 Aug 2013 01:42:36 GMT
An annotation field would be much better than the current "anything goes" schema-less schema.xml.

Has anyone built an XML Schema for schema.xml? I know it is extensible, but it would be worth
a try.

wunder

On Jul 31, 2013, at 6:21 PM, Steve Rowe wrote:

> In thinking about making the entire Solr schema REST-API-addressable (SOLR-4898), I'd
like to be able to add arbitrary metadata at both the top level of the schema and at each
leaf node, and allow read/write access to that metadata via the REST API.
> 
> Some uses I've thought of for such a facility: 
> 
> 1. The managed schema now drops XML comments from schema.xml upon conversion to managed-schema
format, but it would be much better if these were somehow preserved, as well as round-trippable
when retrieving the schema and its constituents via the REST API.
> 
> 2. Some comments in the example schemas don't refer to just one or to all leaf nodes,
but rather to a group of them. I'd like to be able to group nodes by adding same-named "tags"
to multiple nodes, and also have a top-level (optional) "tag description" - this description
could then be presented with tagged nodes in various output formats.
> 
> 3. Some comments in the example schema are documentation about a feature, e.g. copyFields.
 A top-level "documentation" annotation could take a leaf node element name (or maybe an XPath?
probably overkill) and apply to all matching elements. 
> 
> 4. When modifying the schema via REST API, a "last-modified" annotation could be automatically
added.
> 
> 5. There were a couple of user complaints recently when schema.xml parsing was tightened
to disallow unknown attributes on field declarations (SOLR-4641): people were storing their
own information there.  User-level metadata would support this in a round-trippable way -
I'm thinking we could restrict it to flat string-typed key/value pairs, with no nested structure.
> 
> W3C XML Schema has a similar facility: <http://www.w3.org/TR/2004/REC-xmlschema-1-20041028/structures.html#element-annotation>.
> 
> Thoughts?
> 
> Some concrete examples of what I'm thinking of in schema.xml format (syntax/naming as
yet unsettled):
> 
> <schema name="example" version="1.5">
>  <annotation>
>    <description element="tag" content="plain-numeric-field-types">
>      Plain numeric field types store and index the text value verbatim.
>    </description>
>    <documentation element="copyField">
>      copyField commands copy one field to another at the time a document
>      is added to the index.  It's used either to index the same field differently,
>      or to add multiple fields to the same field for easier/faster searching.
>    </documentation>
>    <last-modified>2014-03-08T12:14:02Z</last-modified>
>    …
>  </annotation>
> …
>  <fieldType name="pint" class="solr.IntField">
>    <annotation>
>      <tag>plain-numeric-field-types</tag>
>    </annotation>
>  </fieldType>
>  <fieldType name="plong" class="solr.LongField">
>    <annotation>
>      <tag>plain-numeric-field-types</tag>
>    </annotation>
>  </fieldType>
>  …
>  <copyField source="cat" dest="text">
>    <annotation>
>      <todo>Should this field really be copied to the catchall text field?</todo>
>    </annotation>
>  </copyField>
>  …
>  <field name="text" type="text_general">
>    <annotation>
>      <description>catchall field</description>
>      <visibility>public</visibility>
>    </annotation>
>  </field>
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
> 

--
Walter Underwood
wunder@wunderwood.org




Mime
View raw message