lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yonik Seeley (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SOLR-3250) Dynamic Field capabilities based on value not name
Date Thu, 15 Mar 2012 15:27:38 GMT

    [ https://issues.apache.org/jira/browse/SOLR-3250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13230243#comment-13230243
] 

Yonik Seeley commented on SOLR-3250:
------------------------------------

Of course hopefully everyone knows "schemaless" is mostly marketing b.s. - when people do
this, there is still a schema, but it's guessed on first use (and hence generally a horrible
idea for production systems).

It would be easy enough on a single node... but how does one handle a cluster?
Say you index price=0 on nodeA, and price=100.0 on nodeB?

A quick thought on how it might work:
 - have a separate file auto_fields.json that keeps track of the mappings that would be the
same for all cores using that schema
 - when we run across a field we haven't seen before, we must guess a type for it, then grab
a lock - update the auto_fields.json
 - we can update our in-memory schema with any new fields we find in auto_fields.json
 - works the same in ZK mode... it's just the auto_fields.json is in ZK, and we would use
something like optimistic locking to update it


                
> Dynamic Field capabilities based on value not name
> --------------------------------------------------
>
>                 Key: SOLR-3250
>                 URL: https://issues.apache.org/jira/browse/SOLR-3250
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: Grant Ingersoll
>
> In some situations, one already knows the schema of their content, so having to declare
a schema in Solr becomes cumbersome in some situations.  For instance, if you have all your
content in JSON (or can easily generate it) or other typed serializations, then you already
have a schema defined.  It would be nice if we could have support for dynamic fields that
used whatever name was passed in, but then picked the appropriate FieldType for that field
based on the value of the content.  So, for instance, if the input is a number, it would select
the appropriate numeric type.  If it is a plain text string, it would pick the appropriate
text field (you could even add in language detection here).  If it is comma separated, it
would treat them as keywords, etc.  Also, we could likely send in a hint as to the type too.
> With this approach, you of course have a "first in wins" situation, but assuming you
have this schema defined elsewhere, it is likely fine.
> Supporting such cases would allow us to be schemaless when appropriate, while offering
the benefits of schemas when appropriate.  Naturally, one could mix and match these too.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message