accumulo-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Christopher Tubbs (JIRA)" <j...@apache.org>
Subject [jira] [Created] (ACCUMULO-1719) Convenient instanceName to instanceID mapping is unnecessary
Date Tue, 17 Sep 2013 23:34:51 GMT
Christopher Tubbs created ACCUMULO-1719:
-------------------------------------------

             Summary: Convenient instanceName to instanceID mapping is unnecessary
                 Key: ACCUMULO-1719
                 URL: https://issues.apache.org/jira/browse/ACCUMULO-1719
             Project: Accumulo
          Issue Type: Improvement
          Components: client
            Reporter: Christopher Tubbs
             Fix For: 1.7.0


ZooKeeperInstance constructor typically takes two parameters: instanceName and a comma separated
list of zookeeper host[:port] (there's some others also, that take a UUID and/or a timeout
setting).

Initialize generates a UUID and associates a user-provided instanceName to it, with the following
mapping in ZooKeeper:

/accumulo/instances/instanceName, which contains a UUID, which points to /accumulo/UUID

Since the introduction of instance.secret, there are potential problems with this mapping.

If /accumulo (and /accumulo/instances and /accumulo/instances/instanceName) is created by
Initialize in a write-protected way (using instance.secret), then re-initializing with a new
generated instanceID but the same instanceName will not work unless the new instance has the
same instance secret. This is very limiting and can be a nightmare for system administrators
and developers trying to re-initialize.

If it is not created in a write-protected way, there's an even bigger problem, because anybody
with access to ZooKeeper can overwrite the old mapping to point to a new instance (and we
expect all clients to be able to access ZooKeeper). While the old data is still protected,
any clients connecting with the instanceName will connect (and ingest to) the new instanceID
that the instanceName currently maps to.

The current implementation appears to be using the former... (the instanceName node itself
is protected by the same secret as the instanceId and child nodes). This means that at least
the mapping is protected from being overwritten... but it also means that it doesn't provide
us with any added value. Even if we're counting the added value of being able to reinitialize
the same instanceName (generating a new instanceID), leaving the old instance data around
for inspection, we've got the problems of ZK filling up and the fact that the mapping was
re-written, we can't tell which old instanceID was the previous one to inspect.

A better solution:

Drop the mapping. It is unnecessary complex with no added value. Allow the instanceName that
users create in new versions to represent the unique ID. Don't generate/use UUIDs anymore...
use the provided instanceName. Keep the API for UUID... but just for convenience (treat it
like a string internally). We can still prompt to overwrite the old instance... if it exists
AND we have the same secret... but when we "overwrite it", we can optionally rename the old
instanceName to instanceName_backup_date.

Dropping the mapping has the benefit of reduced complexity, and (mostly) backwards-compatible
(instances can't have the name "instances"). It is easier on developers to debug their instances,
because there's no obscure UUID to deal with (unless they want to use that as the name) and
they can find the old versions of their instances if they choose to back up the old data when
re-initalizing. If not, they can avoid ZK filling up (esp. in dev environments where instanceNames
get reused often). And, with a backup naming convention, it's easy for admins to decide which
old instance data to keep and which to throw away... without the need of a mapping. The scope
for the instance.secret is also well-defined to just the /accumulo/instanceName that created
it, and there's no possibility of overwriting the instanceName to instanceID mapping.

Instance names work best when unique. Instance IDs are guaranteed to be unique. There's no
good reason these should be separate things.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message