hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chongxin Li" <lichong...@zju.edu.cn>
Subject Re: Review Request: HBASE-50: Snapshot of table
Date Mon, 09 Aug 2010 11:56:50 GMT


> On 2010-08-02 13:41:35, Todd Lipcon wrote:
> > src/main/java/org/apache/hadoop/hbase/master/SnapshotLogCleaner.java, line 22
> > <http://review.cloudera.org/r/467/diff/2/?file=4140#file4140line22>
> >
> >     worth noting that this class is not thread-safe? I don't know if these classes
need to be thread safe, but you're using an unsynchronized hashset. Also, since refreshHLogsAndSearch
clears hlogs before re-adding stuff, it needs to be synchronized more than just using a synchronized
collection.

This class is only instantiated once by LogsCleaner so it can be seen as a singleton per master.


> On 2010-08-02 13:41:35, Todd Lipcon wrote:
> > src/main/java/org/apache/hadoop/hbase/master/SnapshotMonitor.java, line 116
> > <http://review.cloudera.org/r/467/diff/2/?file=4141#file4141line116>
> >
> >     does ZKW automatically re-watch the nodes for you, here?
> >     
> >     Also, how does this interact with region server failure? We just assume that
the snapshot will timeout and abort, instead of proactively detecting?

Yes, the ZKW automatically re-watch the nodes.

For snapshot abort, if any region server fails to perform the snapshot, it will remove corresponding
ready and finished nodes under snapshot directory. This would notify the master snapshot failure
and further abort snapshot on all region servers via ZK

For snapshot timeout, it is not detected here. In method waitToFinish, the snapshot status
is checked at a regular time (3 seconds here). If this method timeout, exception would be
thrown and master will abort the snapshot over the cluster.


> On 2010-08-02 13:41:35, Todd Lipcon wrote:
> > src/main/java/org/apache/hadoop/hbase/master/TableSnapshot.java, line 132
> > <http://review.cloudera.org/r/467/diff/2/?file=4143#file4143line132>
> >
> >     is there a process that scans for cases where the reference count has gotten
out of sync?
> >     I'm worried about a case where a snapshot is half-done, and then it fails, so
the snapshot is considered aborted, but we never clean up the references because META has
been incremented.

This is added in META scanner. Since scanning reference files is expensive, only a few regions
are checked and synchronized in one scan. A last checking time is added so that all reference
regions are guaranteed to be checked eventually


> On 2010-08-02 13:41:35, Todd Lipcon wrote:
> > src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperWrapper.java, line 1403
> > <http://review.cloudera.org/r/467/diff/2/?file=4153#file4153line1403>
> >
> >     these checks are inherently racy

Then remove it?


> On 2010-08-02 13:41:35, Todd Lipcon wrote:
> > src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java, line 585
> > <http://review.cloudera.org/r/467/diff/2/?file=4148#file4148line585>
> >
> >     this seems prone to collision if it's multithreaded, since the exists check
and the use of the filename aren't atomic

Then how to guarantee atomicity? This unique file name should be unique respecting existing
files and files which are already compacted and deleted. Otherwise there might be a name collision
in archive directory for HFiles


> On 2010-08-02 13:41:35, Todd Lipcon wrote:
> > src/main/java/org/apache/hadoop/hbase/HSnapshotDescriptor.java, line 132
> > <http://review.cloudera.org/r/467/diff/2/?file=4130#file4130line132>
> >
> >     since we're using the snapshot name as a directory name in HDFS, it has to be
a UTF8 string, so why not just keep it as a String above too?

I implemented this class following HTableDescriptor. And even for table name, it is usually
used as a byte array instead of String


> On 2010-08-02 13:41:35, Todd Lipcon wrote:
> > src/main/java/org/apache/hadoop/hbase/io/Reference.java, line 61
> > <http://review.cloudera.org/r/467/diff/2/?file=4134#file4134line61>
> >
> >     to keep compatibility with current storefiles, "entire" should be value 2, and
bottom should be 0
> >     
> >     while we're at it, maybe rename these to be all caps - Range.TOP, Range.BOTTOM,
etc

Have been renamed in the latest revision


> On 2010-08-02 13:41:35, Todd Lipcon wrote:
> > src/main/java/org/apache/hadoop/hbase/master/HMaster.java, lines 918-919
> > <http://review.cloudera.org/r/467/diff/2/?file=4138#file4138line918>
> >
> >     should this be an exception, rather than a result code? ie is it normal to fail?

Currently all results except ALl_FINISH would throw an exception.


> On 2010-08-02 13:41:35, Todd Lipcon wrote:
> > src/main/java/org/apache/hadoop/hbase/master/HMaster.java, line 925
> > <http://review.cloudera.org/r/467/diff/2/?file=4138#file4138line925>
> >
> >     do we have a race here? what if the table gets enabled while the snapshot is
being processed? it seems we need some locking here around table status and snapshot modification

Why do we have a race here? I think we can't call HMaster.enableTable during the execution
of this method, right? TableDelete is implemented in the same way, the table would not get
enabled when it is being deleted, isn't it?


> On 2010-08-02 13:41:35, Todd Lipcon wrote:
> > src/main/java/org/apache/hadoop/hbase/master/SnapshotLogCleaner.java, line 47
> > <http://review.cloudera.org/r/467/diff/2/?file=4140#file4140line47>
> >
> >     do we have a race here between the listStatus and creating a snapshot?

I've done this modification in the latest revision. SnapshotLogCleaner's cache is refreshed
after listStatus of LogsCleaner and LogCleaner only cleans the archived logs, creating a new
snapshot would not use the archived logs.


> On 2010-08-02 13:41:35, Todd Lipcon wrote:
> > src/main/java/org/apache/hadoop/hbase/master/SnapshotLogCleaner.java, line 115
> > <http://review.cloudera.org/r/467/diff/2/?file=4140#file4140line115>
> >
> >     do we really want to swallow this IOE?

This implements a method from the interface. The method declaration does not throw any exceptions.


> On 2010-08-02 13:41:35, Todd Lipcon wrote:
> > src/main/java/org/apache/hadoop/hbase/master/SnapshotMonitor.java, line 40
> > <http://review.cloudera.org/r/467/diff/2/?file=4141#file4141line40>
> >
> >     this is basically a singleton per-master, right? not per-snapshot? ie this is
created at master startup and destroyed at master shutdown, and can coordinate multiple snapshots
over its lifetime? would be good to document how this fits into the overall design, and perhaps
refactor into one part that is master-global and another part that is created once per-snapshot.

This class has been refactored into two parts


> On 2010-08-02 13:41:35, Todd Lipcon wrote:
> > src/main/java/org/apache/hadoop/hbase/master/SnapshotMonitor.java, line 169
> > <http://review.cloudera.org/r/467/diff/2/?file=4141#file4141line169>
> >
> >     assert that we're in ALLREADY state here?

right


- Chongxin


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://review.cloudera.org/r/467/#review618
-----------------------------------------------------------


On 2010-08-09 03:52:11, Chongxin Li wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> http://review.cloudera.org/r/467/
> -----------------------------------------------------------
> 
> (Updated 2010-08-09 03:52:11)
> 
> 
> Review request for hbase.
> 
> 
> Summary
> -------
> 
> This patch includes the first three sub-tasks of HBASE-50:
> 1. Start and monitor the creation of snapshot via ZooKeeper
> 2. Create snapshot of an HBase table
> 3. Some existing functions of HBase are modified to support snapshot
> 
> Currently snapshots can be created as expected, but can not be restored or deleted yet
> 
> 
> This addresses bug HBASE-50.
>     http://issues.apache.org/jira/browse/HBASE-50
> 
> 
> Diffs
> -----
> 
>   src/main/java/org/apache/hadoop/hbase/HConstants.java c77ebf5 
>   src/main/java/org/apache/hadoop/hbase/HRegionInfo.java ee94690 
>   src/main/java/org/apache/hadoop/hbase/HSnapshotDescriptor.java PRE-CREATION 
>   src/main/java/org/apache/hadoop/hbase/HTableDescriptor.java 0d57270 
>   src/main/java/org/apache/hadoop/hbase/SnapshotExistsException.java PRE-CREATION 
>   src/main/java/org/apache/hadoop/hbase/TablePartialOpenException.java PRE-CREATION 
>   src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java 8b01aa0 
>   src/main/java/org/apache/hadoop/hbase/io/HalfStoreFileReader.java ed12e7a 
>   src/main/java/org/apache/hadoop/hbase/io/HbaseObjectWritable.java 85fde3a 
>   src/main/java/org/apache/hadoop/hbase/io/Reference.java 219203c 
>   src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java b2de7e4 
>   src/main/java/org/apache/hadoop/hbase/ipc/HBaseRPCProtocolVersion.java d4bcbed 
>   src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java bd48a4b 
>   src/main/java/org/apache/hadoop/hbase/mapreduce/LoadIncrementalHFiles.java 1183584

>   src/main/java/org/apache/hadoop/hbase/master/BaseScanner.java 69eab39 
>   src/main/java/org/apache/hadoop/hbase/master/DeleteSnapshot.java PRE-CREATION 
>   src/main/java/org/apache/hadoop/hbase/master/HMaster.java e4bd30d 
>   src/main/java/org/apache/hadoop/hbase/master/LogsCleaner.java 9d1a8b8 
>   src/main/java/org/apache/hadoop/hbase/master/RestoreSnapshot.java PRE-CREATION 
>   src/main/java/org/apache/hadoop/hbase/master/SnapshotLogCleaner.java PRE-CREATION 
>   src/main/java/org/apache/hadoop/hbase/master/SnapshotMonitor.java PRE-CREATION 
>   src/main/java/org/apache/hadoop/hbase/master/SnapshotOperation.java PRE-CREATION 
>   src/main/java/org/apache/hadoop/hbase/master/SnapshotTracker.java PRE-CREATION 
>   src/main/java/org/apache/hadoop/hbase/master/TableDelete.java 1153e62 
>   src/main/java/org/apache/hadoop/hbase/master/TableSnapshot.java PRE-CREATION 
>   src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java 6dc41a4 
>   src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java 6a54736 
>   src/main/java/org/apache/hadoop/hbase/regionserver/SnapshotThread.java PRE-CREATION

>   src/main/java/org/apache/hadoop/hbase/regionserver/Store.java ae9e190 
>   src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java 757a50c 
>   src/main/java/org/apache/hadoop/hbase/regionserver/ZKSnapshotWatcher.java PRE-CREATION

>   src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java 9593286 
>   src/main/java/org/apache/hadoop/hbase/replication/master/ReplicationLogCleaner.java
4d4b00a 
>   src/main/java/org/apache/hadoop/hbase/util/FSUtils.java 5cf3481 
>   src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperWrapper.java 3827fa5 
>   src/main/resources/hbase-default.xml b73f0ff 
>   src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java 4d09fe9 
>   src/test/java/org/apache/hadoop/hbase/client/TestAdmin.java c9b78b9 
>   src/test/java/org/apache/hadoop/hbase/master/TestSnapshot.java PRE-CREATION 
>   src/test/java/org/apache/hadoop/hbase/master/TestSnapshotFailure.java PRE-CREATION

>   src/test/java/org/apache/hadoop/hbase/regionserver/TestCompaction.java 34b8044 
>   src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java 98bd3e5 
>   src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegionSnapshot.java PRE-CREATION

>   src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java 38ef520 
>   src/test/java/org/apache/hadoop/hbase/regionserver/TestZKSnapshotWatcher.java PRE-CREATION

> 
> Diff: http://review.cloudera.org/r/467/diff
> 
> 
> Testing
> -------
> 
> Unit tests and integration tests with mini cluster passed.
> 
> 
> Thanks,
> 
> Chongxin
> 
>


Mime
View raw message