incubator-blur-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aaron McCurry (JIRA)" <>
Subject [jira] [Commented] (BLUR-112) Allow for types to be set on blur tables
Date Wed, 26 Jun 2013 01:12:20 GMT


Aaron McCurry commented on BLUR-112:


These are all good questions, to be honest this is a feature that I know we need but I am
unsure of what the correct implementation should be.  I like your line of thinking though.
 After given this some thought and I have tried to document some of these thoughts below.
 Like I said I am not sold on any implementation so I am open to other ideas on how we should

I believe there are 2 fundamental operating modes when it comes to column definitions/types/analyzers.

* The first mode is when you know all the columns and definitions up front and you define
a column with a type and an optional analyzer if that type is a TextField.  This mode is required
for MapReduce indexing because it needs to know up front how to index all the data.

* The second mode is when no columns or types are known up front but as they are discovered
they need to be added to the table descriptor/analyzer def.

I believe we need to support both modes at the same time.  So back to some of your questions.

- Once defined, we do not allow updates for existing column definitions. We can safely check
whether any Inbound Column types have a conflict with any defined types in AnalyzerDefinition
as we do not have to worry about AnalyzerDefinition object on the shard server being out of
sync with that stored on Zookeeper. 

This is true.

- However for dynamic columns as you explained, the same column might have been added by a
different shard server. This need not happen at the exact same time(correct me) . Even if
ShardServer-B added a dynamic column 5 seconds before, the AnalyzerDefinition in ShardServer-A
does not reflect that change

So this is where the ZkCachedMap comes in, basically it is an inmemory cache of fields (or
any other values really) that can only be set once and is persistent.  Also it serves as a
consistent store for all the types across all the shards servers.

- Also a little confused about how to use ZkCachedMap. What values will we be caching/storing
using this? Are they only the one's which can be overwritten on Zookeeper?

I think that we should store all the types (maybe more information like the entire column
definition etc) in ZkCachedMap for all cases.  I think that this will make things more consistent.
 Perhaps the ZkCachedMap needs to become the storage mechanism for the AnalyzerDefinition.
 Also the ZkCachedMap is probably a bad name for that class we might need to come up with
something else.  As I look at your pseudo code, you are right we will need to reload the analyzer
somehow, it will likely need to be driven from a ZK watcher on the update of the ZkCacheMap.

Operations at a high level

* From an external api (Thrift), I would think we would need a method that looks like "addColumnDefinition(family,name,type,analyzer,fulltext)"
or something like that
* Next it would call a method to update the ZkCacheMap and store to ZK
* Then a ZK watcher would fire in the ZooKeeperClusterStatus class (all the shard servers
would also fire on the watcher) that would update the TableDescriptor, then perhaps we would
clear the table context cache
* And the clearing of the cache would force the recreation of the the BlurAnalyzer (if it
doesn't we should make it)
* We would also need to figure out how to get the new analyzer into the index writer (maybe
with an atomic reference inside a analyzer decorator?)

As we talk about this feature I think we need to break this one up into sub tasks.  Let me
know what you think.


> Allow for types to be set on blur tables
> ----------------------------------------
>                 Key: BLUR-112
>                 URL:
>             Project: Apache Blur
>          Issue Type: Improvement
>    Affects Versions: 0.2.0, 0.3.0
>            Reporter: Aaron McCurry
>             Fix For: 0.3.0
> Create the ability for Blur to handle the default Lucene field types.  This should not
be tied to the table descriptor because types should be allowed to be added at runtime.  Also
2 new fields should be added to the TableDescriptor:
> 1. A strict types attribute.  If set to true, if a new column is added to the table and
there is no type mapping for it.  Throw an exception.  Set to false by default.
> 2. Default type is strict is set to false.  The default type should be text.
> Also, dynamic columns could be allowed if their name included the type.  Such as:
> The column name could be "col1" with a type of "int", in the Column struct in thrift
the name would be "col1/int" and if the type did not exist before the call it would be added.
> Thoughts?

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see:

View raw message