hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chongxin Li" <lichong...@zju.edu.cn>
Subject Re: Review Request: HBASE-50: Snapshot of table
Date Wed, 11 Aug 2010 12:57:51 GMT

> On 2010-08-10 21:34:40, stack wrote:
> > src/main/java/org/apache/hadoop/hbase/HTableDescriptor.java, line 673
> > <http://review.cloudera.org/r/467/diff/3/?file=6002#file6002line673>
> >
> >     This is fine for an hbase that is a fresh install but what about case where
the data has been migrated from an older hbase version; it won't have this column family in
.META.  We should make a little migration script that adds it or on start of new version,
check for it and if not present, create it.

That's right. But AddColumn operation requires the table disabled to proceed, ROOT table can
not be disabled once the system is started. Then how could we execute the migration script
or check and create it on start of new version?

> On 2010-08-10 21:34:40, stack wrote:
> > src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java, line 899
> > <http://review.cloudera.org/r/467/diff/3/?file=6005#file6005line899>
> >
> >     Can the snapshot name be empty and then we'll make one up?

a default snapshot name? or a auto-generated snapshot name, such as creation time?

> On 2010-08-10 21:34:40, stack wrote:
> > src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java, line 951
> > <http://review.cloudera.org/r/467/diff/3/?file=6005#file6005line951>
> >
> >     For restore of the snapshot, do you use loadtable.rb or Todd's new bulkloading

Currently, NO...
Snapshot is composed of a list of log files and a bunch of reference files for HFiles of the
table. These reference files have the same hierarchy as the original table and the name is
in the format of "1239384747630.tablename", where the front is the file name of the referred
HFile and the latter is table name for snapshot. Thus to restore a snapshot, just copy reference
files (which are just a few bytes) to the table dir, update the META and split the logs. When
this table is enabled, the system know how to replay the commit edits and read such a reference
file. Methods getReferredToFile, open in StoreFile are updated to deal with this kind of reference
files for snapshots.

At present, snapshot can only be restored to the table whose name is the same as the one for
which the snapshot is created. That the old table with the same name must be deleted before
restore a snapshot. That's what I do in unit test TestAdmin. Restoring snapshot to a different
table name has a low priority. It has not been implemented yet.

> On 2010-08-10 21:34:40, stack wrote:
> > src/main/java/org/apache/hadoop/hbase/io/Reference.java, line 50
> > <http://review.cloudera.org/r/467/diff/3/?file=6008#file6008line50>
> >
> >     Whats this?  A different kind of reference?

Yes.. This is the reference file in snapshot. It references an HFile of the original table.

> On 2010-08-10 21:34:40, stack wrote:
> > src/main/java/org/apache/hadoop/hbase/master/SnapshotLogCleaner.java, line 115
> > <http://review.cloudera.org/r/467/diff/3/?file=6018#file6018line115>
> >
> >     This looks like a class that you could write a unit test for?

Sure, I'll add another case in TestLogsCleaner.

> On 2010-08-10 21:34:40, stack wrote:
> > src/main/java/org/apache/hadoop/hbase/master/RestoreSnapshot.java, line 130
> > <http://review.cloudera.org/r/467/diff/3/?file=6017#file6017line130>
> >
> >     If table were big, this could be prohibitively expensive?  A single-threaded
copy of all of a tables data?  We could compliment this with MR-base restore, something that
did the copy using MR?

This method is only used in RestoreSnapshot, where reference files of snapshot are copied
to the table dir. These reference files just contains a few bytes instead of the table's data.
Snapshots share the table data with the original table and other snapshots. Do we still need
a MR job?

> On 2010-08-10 21:34:40, stack wrote:
> > src/main/java/org/apache/hadoop/hbase/master/BaseScanner.java, line 212
> > <http://review.cloudera.org/r/467/diff/3/?file=6013#file6013line212>
> >
> >     Why Random negative number?  Why not just leave it blank?

If a blank value is used as the key, there would be only one item at last if it is the first
few times to scan the regions. Using random negative number indicates all these regions have
not been scanned before. If it has been scanned, there would be a last checking time for it

> On 2010-08-10 21:34:40, stack wrote:
> > src/main/java/org/apache/hadoop/hbase/mapreduce/LoadIncrementalHFiles.java, line
> > <http://review.cloudera.org/r/467/diff/3/?file=6012#file6012line251>
> >
> >     Is this comment right?

I just renamed the Ranges to caps, comment was not changed.

> On 2010-08-10 21:34:40, stack wrote:
> > src/main/java/org/apache/hadoop/hbase/io/Reference.java, line 149
> > <http://review.cloudera.org/r/467/diff/3/?file=6008#file6008line149>
> >
> >     Hmm... is this good?  You are dropping some the region name when you toString.
 Do we have to?

This has not been changed. Just rename field "region" to "range"

> On 2010-08-10 21:34:40, stack wrote:
> > src/main/java/org/apache/hadoop/hbase/io/Reference.java, line 156
> > <http://review.cloudera.org/r/467/diff/3/?file=6008#file6008line156>
> >
> >     This could be a problem when fellas migrate old data to use this new hbase.
 If there are References in the old data, then this deserialization will fail?  I'm fine w/
you creating a new issue named something like "Migration from 0.20.x hbase to 0.90" and adding
a note in there that we need to consider this little issue.  Better though would be if the
data was able to migrate itself at runtime; i.e. recognize a boolean on the stream and then
deserialize the old style into the new, etc.

Actually I think it is fine to migrate old data to new hbase. Old references are serialized
by DataOutput.writeBoolean(boolean), where value (byte)1 is written if the argument is true
and value (byte)0 is written if argument is false. 

See (from Ted's review):

Thus value (byte)1 was written if it is the top file region (isTopFileRegion is true). That
is exactly the same as current value of TOP. For the same reason, this deserialization would
work for the references in the old data, right? 

That's why we can not use ordinal of Enum and serialize the int value. The serialization size
of this field would be different for the new data and old data if int value is used.

- Chongxin

This is an automatically generated e-mail. To reply, visit:

On 2010-08-09 03:52:11, Chongxin Li wrote:
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> http://review.cloudera.org/r/467/
> -----------------------------------------------------------
> (Updated 2010-08-09 03:52:11)
> Review request for hbase.
> Summary
> -------
> This patch includes the first three sub-tasks of HBASE-50:
> 1. Start and monitor the creation of snapshot via ZooKeeper
> 2. Create snapshot of an HBase table
> 3. Some existing functions of HBase are modified to support snapshot
> Currently snapshots can be created as expected, but can not be restored or deleted yet
> This addresses bug HBASE-50.
>     http://issues.apache.org/jira/browse/HBASE-50
> Diffs
> -----
>   src/main/java/org/apache/hadoop/hbase/HConstants.java c77ebf5 
>   src/main/java/org/apache/hadoop/hbase/HRegionInfo.java ee94690 
>   src/main/java/org/apache/hadoop/hbase/HSnapshotDescriptor.java PRE-CREATION 
>   src/main/java/org/apache/hadoop/hbase/HTableDescriptor.java 0d57270 
>   src/main/java/org/apache/hadoop/hbase/SnapshotExistsException.java PRE-CREATION 
>   src/main/java/org/apache/hadoop/hbase/TablePartialOpenException.java PRE-CREATION 
>   src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java 8b01aa0 
>   src/main/java/org/apache/hadoop/hbase/io/HalfStoreFileReader.java ed12e7a 
>   src/main/java/org/apache/hadoop/hbase/io/HbaseObjectWritable.java 85fde3a 
>   src/main/java/org/apache/hadoop/hbase/io/Reference.java 219203c 
>   src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java b2de7e4 
>   src/main/java/org/apache/hadoop/hbase/ipc/HBaseRPCProtocolVersion.java d4bcbed 
>   src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java bd48a4b 
>   src/main/java/org/apache/hadoop/hbase/mapreduce/LoadIncrementalHFiles.java 1183584

>   src/main/java/org/apache/hadoop/hbase/master/BaseScanner.java 69eab39 
>   src/main/java/org/apache/hadoop/hbase/master/DeleteSnapshot.java PRE-CREATION 
>   src/main/java/org/apache/hadoop/hbase/master/HMaster.java e4bd30d 
>   src/main/java/org/apache/hadoop/hbase/master/LogsCleaner.java 9d1a8b8 
>   src/main/java/org/apache/hadoop/hbase/master/RestoreSnapshot.java PRE-CREATION 
>   src/main/java/org/apache/hadoop/hbase/master/SnapshotLogCleaner.java PRE-CREATION 
>   src/main/java/org/apache/hadoop/hbase/master/SnapshotMonitor.java PRE-CREATION 
>   src/main/java/org/apache/hadoop/hbase/master/SnapshotOperation.java PRE-CREATION 
>   src/main/java/org/apache/hadoop/hbase/master/SnapshotTracker.java PRE-CREATION 
>   src/main/java/org/apache/hadoop/hbase/master/TableDelete.java 1153e62 
>   src/main/java/org/apache/hadoop/hbase/master/TableSnapshot.java PRE-CREATION 
>   src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java 6dc41a4 
>   src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java 6a54736 
>   src/main/java/org/apache/hadoop/hbase/regionserver/SnapshotThread.java PRE-CREATION

>   src/main/java/org/apache/hadoop/hbase/regionserver/Store.java ae9e190 
>   src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java 757a50c 
>   src/main/java/org/apache/hadoop/hbase/regionserver/ZKSnapshotWatcher.java PRE-CREATION

>   src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java 9593286 
>   src/main/java/org/apache/hadoop/hbase/replication/master/ReplicationLogCleaner.java
>   src/main/java/org/apache/hadoop/hbase/util/FSUtils.java 5cf3481 
>   src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperWrapper.java 3827fa5 
>   src/main/resources/hbase-default.xml b73f0ff 
>   src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java 4d09fe9 
>   src/test/java/org/apache/hadoop/hbase/client/TestAdmin.java c9b78b9 
>   src/test/java/org/apache/hadoop/hbase/master/TestSnapshot.java PRE-CREATION 
>   src/test/java/org/apache/hadoop/hbase/master/TestSnapshotFailure.java PRE-CREATION

>   src/test/java/org/apache/hadoop/hbase/regionserver/TestCompaction.java 34b8044 
>   src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java 98bd3e5 
>   src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegionSnapshot.java PRE-CREATION

>   src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java 38ef520 
>   src/test/java/org/apache/hadoop/hbase/regionserver/TestZKSnapshotWatcher.java PRE-CREATION

> Diff: http://review.cloudera.org/r/467/diff
> Testing
> -------
> Unit tests and integration tests with mini cluster passed.
> Thanks,
> Chongxin

View raw message