hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From st...@duboce.net
Subject Re: Review Request: HBASE-2223
Date Fri, 25 Jun 2010 22:07:35 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://review.hbase.org/r/76/#review281
-----------------------------------------------------------


Missing is more desciption of how this feature works.  There is package doc. on how to get
it going but needs fuller description of how replication works (how zk is used, how it manages
hlogs, how it sends edits to remote cluster, etc.).  There is insufficient description in
javadoc.  Without this, there is a danger that only the author will be able to figure whats
going on in here.

How does one get insight on to how well or badly replication is working?

Alot of errors are logged and then we just move on.  Thats fine for an alpha package but need
a plan for making it all more robust?  What you thinking?


bin/replication/add_peer.rb
<http://review.hbase.org/r/76/#comment1187>

    These log messages add nothing?  Remove?  Just restating what was passed on cmdline.



bin/replication/add_peer.rb
<http://review.hbase.org/r/76/#comment1188>

    Why do this?  Its annoying?
    
    It'd be better spending time interrogating what the user passed to see if it makes sense
throwing errors if incorrectly formatted or if it doesn't look like it makes sense?



bin/replication/add_peer.rb
<http://review.hbase.org/r/76/#comment1189>

    A message on end saying what it did can be comforting to users.



bin/replication/copy_tables_desc.rb
<http://review.hbase.org/r/76/#comment1190>

    See above comments.



src/main/java/org/apache/hadoop/hbase/HConstants.java
<http://review.hbase.org/r/76/#comment1191>

    Change name of this config to hbase.replication.



src/main/java/org/apache/hadoop/hbase/ipc/HRegionInterface.java
<http://review.hbase.org/r/76/#comment1192>

    Does this method have prerequisites? For example, does replication have to be enabled?



src/main/java/org/apache/hadoop/hbase/master/HMaster.java
<http://review.hbase.org/r/76/#comment1193>

    Is this a leak from another patch?



src/main/java/org/apache/hadoop/hbase/master/RegionServerOperationQueue.java
<http://review.hbase.org/r/76/#comment1194>

    This is a leak from another patch?



src/main/java/org/apache/hadoop/hbase/master/ServerManager.java
<http://review.hbase.org/r/76/#comment1195>

    Whats going on here?  We're setting into config the logcleaner to use but then in the
lines after this we go ahead and create OldLogsCleaner anyways?



src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
<http://review.hbase.org/r/76/#comment1196>

    Just call this 'replication' or 'replicationEnabled'



src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
<http://review.hbase.org/r/76/#comment1198>

    We are a ReplicationSink or a ReplicationMaster?  One or the other?  Can we not move all
of this Replication stuff into a Replication class including the setup of whether we are a
Sink or Master rather than have it hang out here in HRS?  



src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
<http://review.hbase.org/r/76/#comment1197>

    I don't grok the comment



src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
<http://review.hbase.org/r/76/#comment1199>

    This could be null if we are a replication master?



src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
<http://review.hbase.org/r/76/#comment1200>

    If the above mentioned Replication class were a singleton, it could be shared here with
HRS



src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
<http://review.hbase.org/r/76/#comment1201>

    It'd be good if we didn't have to create this byte [] per edit.  I should reinstitute
the comparator I had that took KVs but only compared family portion of the KV... no need to
create family byte [] then.



src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
<http://review.hbase.org/r/76/#comment1202>

    YOu might say why its being moved.  Should be INFO level?  WALs are kinda important to
track?



src/main/java/org/apache/hadoop/hbase/replication/ReplicationZookeeperWrapper.java
<http://review.hbase.org/r/76/#comment1203>

    isReplicating is what you would name the method that accesses the boolean replicating...
with this name the accessor should be named isIsReplicating.



src/main/java/org/apache/hadoop/hbase/replication/ReplicationZookeeperWrapper.java
<http://review.hbase.org/r/76/#comment1204>

    Why do it this way?  Why an if/else?  Why not do if (...length != 3) throw .... no need
of an else.



src/main/java/org/apache/hadoop/hbase/replication/ReplicationZookeeperWrapper.java
<http://review.hbase.org/r/76/#comment1205>

    This is an error? Is it critical fail (We dropped recording a WAL)  Are we just logging
and moving on?



src/main/java/org/apache/hadoop/hbase/replication/ReplicationZookeeperWrapper.java
<http://review.hbase.org/r/76/#comment1206>

    Same here.... We just log and keep going?  Are any of these fatal?



src/main/java/org/apache/hadoop/hbase/replication/ReplicationZookeeperWrapper.java
<http://review.hbase.org/r/76/#comment1207>

    If this fails, what are the repercusssions?



src/main/java/org/apache/hadoop/hbase/replication/ReplicationZookeeperWrapper.java
<http://review.hbase.org/r/76/#comment1208>

    What happens if this server dies?  It gets cleaned up by master?
    
    What happens to replication if master dies?  Does master failover need to do anything
special to pick up running replication?



src/main/java/org/apache/hadoop/hbase/replication/ReplicationZookeeperWrapper.java
<http://review.hbase.org/r/76/#comment1209>

    spelling
    



src/main/java/org/apache/hadoop/hbase/replication/master/ReplicationLogCleaner.java
<http://review.hbase.org/r/76/#comment1213>

    Why this implementation have to know about other implementations?  Can't we do a chain
of decision classes? Any class can say no?  As soon as any decision class says no, we exit
the chain.... So in this case, first on the chain would be the ttl decision... then would
be this one... and third would be the snapshotting decision. You don't have to do the chain
as part of this patch but please open an issue to implement.



src/main/java/org/apache/hadoop/hbase/replication/package.html
<http://review.hbase.org/r/76/#comment1210>

    Its not just MDC, right?



src/main/java/org/apache/hadoop/hbase/replication/package.html
<http://review.hbase.org/r/76/#comment1211>

    Is it user tables or CF in user tables?



src/main/java/org/apache/hadoop/hbase/replication/package.html
<http://review.hbase.org/r/76/#comment1212>

    Good doc.



src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSink.java
<http://review.hbase.org/r/76/#comment1214>

    What is the 'main thread'?



src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSink.java
<http://review.hbase.org/r/76/#comment1215>

    Can you make this clearer?



src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSink.java
<http://review.hbase.org/r/76/#comment1216>

    Whats this mean?  Lost edits?



src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSink.java
<http://review.hbase.org/r/76/#comment1217>

    Maybe say something about how it works... pool of HTables to write remote peer, etc.



src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSink.java
<http://review.hbase.org/r/76/#comment1218>

    Why again does replication keep its own set of logs rather than just keep positions on
HRS logs (whether local HRS or the HRS of others?)  So it prevails across HRS failures?  Aren't
logs archived now rather than thrown away so this should be fine?  Otherwise, we are writing
edits to FS twice, right?



src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSink.java
<http://review.hbase.org/r/76/#comment1220>

    This should be inside a finally block?



src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSink.java
<http://review.hbase.org/r/76/#comment1221>

    Duplicated code -- see up about ten lines.



src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSink.java
<http://review.hbase.org/r/76/#comment1222>

    You are stopping the regionserver because replication failed?



src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java
<http://review.hbase.org/r/76/#comment1223>

    One second seems short?



src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java
<http://review.hbase.org/r/76/#comment1226>

    We carry edits in memory?  And its outside of our normal accounting -- e.g. cache and
max for memstores?  How we get it into the mix?  Else we could OOME if we don't account for
this memory payload.



src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java
<http://review.hbase.org/r/76/#comment1224>

    Ratio?



src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java
<http://review.hbase.org/r/76/#comment1227>

    Not a constructor



src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java
<http://review.hbase.org/r/76/#comment1228>

    Stop cluster or stop the host?



src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java
<http://review.hbase.org/r/76/#comment1230>

    Whats this about?  IN particular, the 'handling story'...



src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java
<http://review.hbase.org/r/76/#comment1231>

    When would it make sense to ever replicate catalog table entries?
    
    Why we even in here reading a file if replicating == false



src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java
<http://review.hbase.org/r/76/#comment1232>

    Are you checking size of these edits?  You'll read 25k entries of any size?



src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java
<http://review.hbase.org/r/76/#comment1233>

    Where do you explain replication hlog filenaming convention?



src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceInterface.java
<http://review.hbase.org/r/76/#comment1234>

    Can only do one sink, right?  Might want to say so.  How does the source know the sink
to replicate to?  Should that be part of the Interface?
    
    The comment for the Interface could say how the Interface is used... init, then set sink,
then per file archived, call ... etc.



src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceManager.java
<http://review.hbase.org/r/76/#comment1235>

    Make this "This class manages all the replication sources"
    
    "There are two classes of them"... what you mean to say here?  Two classes of source?



src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceManager.java
<http://review.hbase.org/r/76/#comment1236>

    I don't follow this commentary.



src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceManager.java
<http://review.hbase.org/r/76/#comment1238>

    You should say more here about whats going on.. either here or up in the class comment.
 It has watchers on all other RSs in current cluster and will try to take on the workloads
of others on crash (?).



src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceManager.java
<http://review.hbase.org/r/76/#comment1239>

    Comment is cut off.



src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceManager.java
<http://review.hbase.org/r/76/#comment1240>

    You reporting status or updating updated position into zk?  There is a running status
being kept up in zk?  Shouldn't this log message have the log path in it too?
    
    This method does more than update position or report status... looks like it changes state
in this class updating set of logs?



src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceManager.java
<http://review.hbase.org/r/76/#comment1241>

    Shouldn't this be nonmodifiable List that you give out here since it seems you have it
synchronized internally here.



src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceManager.java
<http://review.hbase.org/r/76/#comment1243>

    What other regionserver?  There is alot that goes unexplained in here?  Could point to
the doc. on how replication works.  Same for the transer method above (I have an idea because
you white boarded it for me).



src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java
<http://review.hbase.org/r/76/#comment1244>

    Good.


- stack


On 2010-06-23 17:24:52, Jean-Daniel Cryans wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> http://review.hbase.org/r/76/
> -----------------------------------------------------------
> 
> (Updated 2010-06-23 17:24:52)
> 
> 
> Review request for hbase.
> 
> 
> Summary
> -------
> 
> This is HBASE-2223 AKA Replication 2.0, it is currently only a "preview patch" as it's
pretty much feature complete, works on a cluster, has unit tests and whatnot, but it could
use a lot more testing and cleaning and ideas from others.
> 
> 
> This addresses bug HBASE-2223.
>     http://issues.apache.org/jira/browse/HBASE-2223
> 
> 
> Diffs
> -----
> 
>   bin/replication/add_peer.rb PRE-CREATION 
>   bin/replication/copy_tables_desc.rb PRE-CREATION 
>   src/main/java/org/apache/hadoop/hbase/HConstants.java 4357ee5 
>   src/main/java/org/apache/hadoop/hbase/ipc/HRegionInterface.java 62617ac 
>   src/main/java/org/apache/hadoop/hbase/master/HMaster.java 5367638 
>   src/main/java/org/apache/hadoop/hbase/master/LogCleanerDelegate.java 4c5153e 
>   src/main/java/org/apache/hadoop/hbase/master/RegionServerOperationQueue.java 10f9dbd

>   src/main/java/org/apache/hadoop/hbase/master/ServerManager.java 9fb1cce 
>   src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java 33b9a99 
>   src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java 94f18c6 
>   src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java 5d4cffe 
>   src/main/java/org/apache/hadoop/hbase/replication/ReplicationZookeeperWrapper.java
PRE-CREATION 
>   src/main/java/org/apache/hadoop/hbase/replication/master/ReplicationLogCleaner.java
PRE-CREATION 
>   src/main/java/org/apache/hadoop/hbase/replication/package.html PRE-CREATION 
>   src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSink.java
PRE-CREATION 
>   src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java
PRE-CREATION 
>   src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceInterface.java
PRE-CREATION 
>   src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceManager.java
PRE-CREATION 
>   src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperWrapper.java b26c071 
>   src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java 0e69709 
>   src/test/java/org/apache/hadoop/hbase/replication/ReplicationSourceDummy.java PRE-CREATION

>   src/test/java/org/apache/hadoop/hbase/replication/TestReplication.java PRE-CREATION

>   src/test/java/org/apache/hadoop/hbase/replication/TestReplicationSource.java PRE-CREATION

>   src/test/java/org/apache/hadoop/hbase/replication/regionserver/TestReplicationSink.java
PRE-CREATION 
>   src/test/java/org/apache/hadoop/hbase/replication/regionserver/TestReplicationSourceManager.java
PRE-CREATION 
> 
> Diff: http://review.hbase.org/r/76/diff
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Jean-Daniel
> 
>


Mime
View raw message