hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jonathan Gray (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HBASE-1295) Federated HBase
Date Wed, 29 Apr 2009 17:26:30 GMT

    [ https://issues.apache.org/jira/browse/HBASE-1295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12704213#action_12704213
] 

Jonathan Gray commented on HBASE-1295:
--------------------------------------

This looks great Andrew!  Some comments...

- What do we do when there are two identical keys in a KeyValue (row, family, column, timestamp)
but different values?  That's actually going to be possible in 0.20 since you can manually
set the stamp, will certainly be possible with multi-master replication.  I'm not sure how
it's handled now.  Would depend on logic in both memcache insertion and more importantly compaction,
and then how it's handled when reading.
- Everything is now just a KeyValue, so that would be what we send to replicas.
- Thoughts on network partitioning?  I'm assuming you're referring to partitioning of replica
clusters from one another, not within a cluster right?  If so, I guess you'd hang on to WALs
as long as you could, eventually a replicated cluster would go into some secondary mode of
needing a full sync (when other cluster(s) could no longer hold all WALs, or should we assume
hdfs will not fill and just flush, so can always resync with WALs?).  (note: handling of intra-cluster
partitions is virtually impossible because of our strong consistency)
- Regarding SCOPE and setting things as local or replicated.  What do you suspect the precision/resolution
of this would be?  Could i have some tables being replicated to some clusters, other tables
to others, some to both?
- Would replicas of tables _always_ require identical family settings?  For example, I have
a cluster of 5 nodes with lots of memory, I want to just replicate a single high-volume, high-read
table from my primary large cluster.  But in the small cluster I want to set a TTL of 1 day
and also set as in-memory.  This is kind of advanced and special but the ability to do things
like that would be very cool, could definitely see us doing something like it were it possible.

I've got a good bit of experience with database replication, did some work in the postgres
world on WAL shipping.  Let me know how I can help your effort.

I agree on your assessment regarding consistency, etc.  It is clear we should be doing an
eventual consistency model for replication.  This is one of my favorite topics!

One thing that's a bit special is this would make an HBase cluster of clusters a "read-your-writes"-style
eventual consistency distribution model (with our strong consistency, partitioned distribution
within each individual cluster a la read-your-writes).  That makes a huge difference for us,
internally, on many of our data systems.  This may be obvious as we're just talking about
replication here, but something to keep in mind.

> Federated HBase
> ---------------
>
>                 Key: HBASE-1295
>                 URL: https://issues.apache.org/jira/browse/HBASE-1295
>             Project: Hadoop HBase
>          Issue Type: New Feature
>            Reporter: Andrew Purtell
>         Attachments: hbase_repl.2.odp, hbase_repl.2.pdf
>
>
> HBase should consider supporting a federated deployment where someone might have terascale
(or beyond) clusters in more than one geography and would want the system to handle replication
between the clusters/regions. It would be sweet if HBase had something on the roadmap to sync
between replicas out of the box. 
> Consider if rows, columns, or even cells could be scoped: local, or global.
> Then, consider a background task on each cluster that replicates new globally scoped
edits to peer clusters. The HBase/Bigtable data model has convenient features (timestamps,
multiversioning) such that simple exchange of globally scoped cells would be conflict free
and would "just work". Implementation effort here would be in producing an efficient mechanism
for collecting up edits from all the HRS and transmitting the edits over the network to peers
where they would then be split out to the HRS there. Holding on to the edit trace and tracking
it until the remote commits succeed would also be necessary. So, HLog is probably the right
place to set up the tee. This would be filtered log shipping, basically.  
> This proposal does not consider transactional tables. For transactional tables, enforcement
of global mutation commit ordering would come into the picture if the user  wants the  transaction
to span the federation. This should be an optional feature even with transactional tables
themselves being optional because of how slow it would be.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message