lucene-solr-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mattmann, Chris A (388J)" <chris.a.mattm...@jpl.nasa.gov>
Subject Re: SOLR-1131 - Multiple Fields per Field Type
Date Tue, 08 Dec 2009 02:57:36 GMT
Hi Hoss,

> 
> : <fieldType name="latlon" type="LatLonFieldType" pattern="location__*" />
> : <fieldType name="latlon_home" type="LatLonFieldType"
> pattern="location_home_*"/>
> : <fieldType name="latlon_work" type="LatLonFieldType"
> pattern="location_home_*"/>
> :
> : <field name="location" type=latlon"/>
> : <field name="location_home" type=latlon_home"/>
> : <field name="location_work" type=latlon_work"/>
> 
> I'm not really understanding the value of an approach like that.  for
> starters, what Lucene field names would ultimately be created in those
> examples?  

The first field would be named location__location.
The second field would be named location_home_location_home.
The third field would be named location_work_location_work.

> And if i also added...
> 
>  <field name="other_location" type=latlon"/>
>  <dynamicField name="*_dynamic_location" type=latlon"/>
> 
> ...then what field names would be created under the covers?
> 

In general, it would be FieldType#getPattern().stripOffEndRegexStarStuff() +
Field#getName(). 

> : I think it makes more sense to define the heterogeneity at the fieldType
> level because:
> :
> : (a) it's a bit more consistent with the existing solr schema examples,
> : where the difference between many of the field types (e.g., ints and
> : tints, which are both solr.TrieIntField's, date and tdate, both
> : instances of solr.TrieDateField, with different configuration, etc.)
> :
> : (b) isolation of change: <fieldType> defs will change less often than
> : <field> defs, where names and indexed/stored/etc. debugging are likely
> : to occur more frequently
> 
> ...this just feels wrong to me ... i can't really explain why.  It seems
> like you are suggesting thatt every <field/> declaration would need a one
> to one corrispondence with a unique <fieldType/> declaration in order to
> prevent field name collisions, which sounds sketchy enough ... but i'm
> also not fond of the idea that a person editing the schema can't just look
> at the <field/> and <dynamicField/> names to ensure that they understand
> what underlying fields are being created (so they don't inadvertantly add
> a new one that collides) ... now they also have to look at the "pattern"
> attribute of every <fieldType/> that is a poly field.

Well if this feels wrong to you then I think the schema.xml file that ships
with SOLR should also feel wrong as well because it uses the exact same
pattern for defining field type variations. That is, differences between
FieldType representations for ints and tints are not stored as variations on
the SchemaField definition itself but they are stored as variation on the
FieldTypes (e.g., a different precisionStep in the case of int [0] versus
that of tint [8]). Based on what you are proposing, why isn't precisionStep
an attribute on <field, rather than <fieldType in those examples?

> 
> letting <dynamicField/> drive everything just seems a *lot* simpler ...
> both as far as implementation, and as far as maintaining the schema.

Possibly. It's also a lot less traceable. It's implicit versus explicit,
which I'm not sure leads to simplicity in the end.

> 
> : I don't think the above hybrid approach will lead to anything other than
> : confusion, as you indicated above. Let's stick to the pattern defs at
> : the <fieldType> level, and then let the fieldType handle the internal
> : "dynamicity" with e.g., a dynamicField, and then notify the schema user
> 
> From the standpoint of reading a schema.xml file, the approach you're
> describing of a pattern attribute on <fieldType/> declarations actaully
> seems more confusing then the strawman suggestion i made of a pattern
> attribute on <field> ... even without understanding what concrete feilds
> you are suggesting would be created with a configuration like that, it
> still increases the number of places you have to look to see what field
> names are getting created.

How so? In actuality, it reduces it. Instead of having pattern definitions
on fields (which there is a greater chance of having more of), you have them
on field types?

Cheers,
Chris

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: Chris.Mattmann@jpl.nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Assistant Professor, Computer Science Department University of
Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++



Mime
View raw message