Return-Path: Delivered-To: apmail-hbase-dev-archive@www.apache.org Received: (qmail 5306 invoked from network); 11 Aug 2010 12:58:22 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 11 Aug 2010 12:58:22 -0000 Received: (qmail 39789 invoked by uid 500); 11 Aug 2010 12:58:22 -0000 Delivered-To: apmail-hbase-dev-archive@hbase.apache.org Received: (qmail 39478 invoked by uid 500); 11 Aug 2010 12:58:19 -0000 Mailing-List: contact dev-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hbase.apache.org Delivered-To: mailing list dev@hbase.apache.org Received: (qmail 39470 invoked by uid 99); 11 Aug 2010 12:58:18 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 11 Aug 2010 12:58:18 +0000 X-ASF-Spam-Status: No, hits=1.8 required=10.0 tests=FH_HELO_EQ_D_D_D_D,MIME_QP_LONG_LINE,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: local policy) Received: from [184.73.217.71] (HELO ip-10-202-7-187.ec2.internal) (184.73.217.71) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 11 Aug 2010 12:58:12 +0000 Received: from ip-10-202-7-187.ec2.internal (localhost [127.0.0.1]) by ip-10-202-7-187.ec2.internal (Postfix) with ESMTP id 0C8C38A1E4; Wed, 11 Aug 2010 12:57:51 +0000 (UTC) Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Subject: Re: Review Request: HBASE-50: Snapshot of table From: "Chongxin Li" To: "Chongxin Li" , jiraposter@review.hbase.org, dev@hbase.apache.org, stack@duboce.net Date: Wed, 11 Aug 2010 12:57:51 -0000 Message-ID: <20100811125751.19917.38844@ip-10-202-7-187.ec2.internal> In-Reply-To: <20100811043440.19917.34442@ip-10-202-7-187.ec2.internal> References: <20100811043440.19917.34442@ip-10-202-7-187.ec2.internal> X-Virus-Checked: Checked by ClamAV on apache.org > On 2010-08-10 21:34:40, stack wrote: > > src/main/java/org/apache/hadoop/hbase/HTableDescriptor.java, line 673 > > > > > > This is fine for an hbase that is a fresh install but what about ca= se where the data has been migrated from an older hbase version; it won't h= ave this column family in .META. We should make a little migration script = that adds it or on start of new version, check for it and if not present, c= reate it. That's right. But AddColumn operation requires the table disabled to procee= d, ROOT table can not be disabled once the system is started. Then how coul= d we execute the migration script or check and create it on start of new ve= rsion? > On 2010-08-10 21:34:40, stack wrote: > > src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java, line 899 > > > > > > Can the snapshot name be empty and then we'll make one up? a default snapshot name? or a auto-generated snapshot name, such as creatio= n time? > On 2010-08-10 21:34:40, stack wrote: > > src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java, line 951 > > > > > > For restore of the snapshot, do you use loadtable.rb or Todd's new = bulkloading scripts? Currently, NO... Snapshot is composed of a list of log files and a bunch of reference files = for HFiles of the table. These reference files have the same hierarchy as t= he original table and the name is in the format of "1239384747630.tablename= ", where the front is the file name of the referred HFile and the latter is= table name for snapshot. Thus to restore a snapshot, just copy reference f= iles (which are just a few bytes) to the table dir, update the META and spl= it the logs. When this table is enabled, the system know how to replay the = commit edits and read such a reference file. Methods getReferredToFile, ope= n in StoreFile are updated to deal with this kind of reference files for sn= apshots. At present, snapshot can only be restored to the table whose name is the sa= me as the one for which the snapshot is created. That the old table with th= e same name must be deleted before restore a snapshot. That's what I do in = unit test TestAdmin. Restoring snapshot to a different table name has a low= priority. It has not been implemented yet. > On 2010-08-10 21:34:40, stack wrote: > > src/main/java/org/apache/hadoop/hbase/io/Reference.java, line 50 > > > > > > Whats this? A different kind of reference? Yes.. This is the reference file in snapshot. It references an HFile of the= original table. > On 2010-08-10 21:34:40, stack wrote: > > src/main/java/org/apache/hadoop/hbase/master/SnapshotLogCleaner.java, l= ine 115 > > > > > > This looks like a class that you could write a unit test for? Sure, I'll add another case in TestLogsCleaner. > On 2010-08-10 21:34:40, stack wrote: > > src/main/java/org/apache/hadoop/hbase/master/RestoreSnapshot.java, line= 130 > > > > > > If table were big, this could be prohibitively expensive? A single= -threaded copy of all of a tables data? We could compliment this with MR-b= ase restore, something that did the copy using MR? This method is only used in RestoreSnapshot, where reference files of snaps= hot are copied to the table dir. These reference files just contains a few = bytes instead of the table's data. Snapshots share the table data with the = original table and other snapshots. Do we still need a MR job? > On 2010-08-10 21:34:40, stack wrote: > > src/main/java/org/apache/hadoop/hbase/master/BaseScanner.java, line 212 > > > > > > Why Random negative number? Why not just leave it blank? If a blank value is used as the key, there would be only one item at last i= f it is the first few times to scan the regions. Using random negative numb= er indicates all these regions have not been scanned before. If it has been= scanned, there would be a last checking time for it instead. > On 2010-08-10 21:34:40, stack wrote: > > src/main/java/org/apache/hadoop/hbase/mapreduce/LoadIncrementalHFiles.j= ava, line 251 > > > > > > Is this comment right? I just renamed the Ranges to caps, comment was not changed. > On 2010-08-10 21:34:40, stack wrote: > > src/main/java/org/apache/hadoop/hbase/io/Reference.java, line 149 > > > > > > Hmm... is this good? You are dropping some the region name when yo= u toString. Do we have to? This has not been changed. Just rename field "region" to "range" > On 2010-08-10 21:34:40, stack wrote: > > src/main/java/org/apache/hadoop/hbase/io/Reference.java, line 156 > > > > > > This could be a problem when fellas migrate old data to use this ne= w hbase. If there are References in the old data, then this deserializatio= n will fail? I'm fine w/ you creating a new issue named something like "Mi= gration from 0.20.x hbase to 0.90" and adding a note in there that we need = to consider this little issue. Better though would be if the data was able= to migrate itself at runtime; i.e. recognize a boolean on the stream and t= hen deserialize the old style into the new, etc. Actually I think it is fine to migrate old data to new hbase. Old reference= s are serialized by DataOutput.writeBoolean(boolean), where value (byte)1 i= s written if the argument is true and value (byte)0 is written if argument = is false. = See (from Ted's review): http://download.oracle.com/javase/1.4.2/docs/api/java/io/DataOutput.html#wr= iteBoolean%28boolean%29 Thus value (byte)1 was written if it is the top file region (isTopFileRegio= n is true). That is exactly the same as current value of TOP. For the same = reason, this deserialization would work for the references in the old data,= right? = That's why we can not use ordinal of Enum and serialize the int value. The = serialization size of this field would be different for the new data and ol= d data if int value is used. - Chongxin ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: http://review.cloudera.org/r/467/#review823 ----------------------------------------------------------- On 2010-08-09 03:52:11, Chongxin Li wrote: > = > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > http://review.cloudera.org/r/467/ > ----------------------------------------------------------- > = > (Updated 2010-08-09 03:52:11) > = > = > Review request for hbase. > = > = > Summary > ------- > = > This patch includes the first three sub-tasks of HBASE-50: > 1. Start and monitor the creation of snapshot via ZooKeeper > 2. Create snapshot of an HBase table > 3. Some existing functions of HBase are modified to support snapshot > = > Currently snapshots can be created as expected, but can not be restored o= r deleted yet > = > = > This addresses bug HBASE-50. > http://issues.apache.org/jira/browse/HBASE-50 > = > = > Diffs > ----- > = > src/main/java/org/apache/hadoop/hbase/HConstants.java c77ebf5 = > src/main/java/org/apache/hadoop/hbase/HRegionInfo.java ee94690 = > src/main/java/org/apache/hadoop/hbase/HSnapshotDescriptor.java PRE-CREA= TION = > src/main/java/org/apache/hadoop/hbase/HTableDescriptor.java 0d57270 = > src/main/java/org/apache/hadoop/hbase/SnapshotExistsException.java PRE-= CREATION = > src/main/java/org/apache/hadoop/hbase/TablePartialOpenException.java PR= E-CREATION = > src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java 8b01aa0 = > src/main/java/org/apache/hadoop/hbase/io/HalfStoreFileReader.java ed12e= 7a = > src/main/java/org/apache/hadoop/hbase/io/HbaseObjectWritable.java 85fde= 3a = > src/main/java/org/apache/hadoop/hbase/io/Reference.java 219203c = > src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java b2de7e4 = > src/main/java/org/apache/hadoop/hbase/ipc/HBaseRPCProtocolVersion.java = d4bcbed = > src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java bd48a4b = > src/main/java/org/apache/hadoop/hbase/mapreduce/LoadIncrementalHFiles.j= ava 1183584 = > src/main/java/org/apache/hadoop/hbase/master/BaseScanner.java 69eab39 = > src/main/java/org/apache/hadoop/hbase/master/DeleteSnapshot.java PRE-CR= EATION = > src/main/java/org/apache/hadoop/hbase/master/HMaster.java e4bd30d = > src/main/java/org/apache/hadoop/hbase/master/LogsCleaner.java 9d1a8b8 = > src/main/java/org/apache/hadoop/hbase/master/RestoreSnapshot.java PRE-C= REATION = > src/main/java/org/apache/hadoop/hbase/master/SnapshotLogCleaner.java PR= E-CREATION = > src/main/java/org/apache/hadoop/hbase/master/SnapshotMonitor.java PRE-C= REATION = > src/main/java/org/apache/hadoop/hbase/master/SnapshotOperation.java PRE= -CREATION = > src/main/java/org/apache/hadoop/hbase/master/SnapshotTracker.java PRE-C= REATION = > src/main/java/org/apache/hadoop/hbase/master/TableDelete.java 1153e62 = > src/main/java/org/apache/hadoop/hbase/master/TableSnapshot.java PRE-CRE= ATION = > src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java 6dc41a4 = > src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java 6= a54736 = > src/main/java/org/apache/hadoop/hbase/regionserver/SnapshotThread.java = PRE-CREATION = > src/main/java/org/apache/hadoop/hbase/regionserver/Store.java ae9e190 = > src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java 757a5= 0c = > src/main/java/org/apache/hadoop/hbase/regionserver/ZKSnapshotWatcher.ja= va PRE-CREATION = > src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java 959328= 6 = > src/main/java/org/apache/hadoop/hbase/replication/master/ReplicationLog= Cleaner.java 4d4b00a = > src/main/java/org/apache/hadoop/hbase/util/FSUtils.java 5cf3481 = > src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperWrapper.java 3= 827fa5 = > src/main/resources/hbase-default.xml b73f0ff = > src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java 4d09fe9 = > src/test/java/org/apache/hadoop/hbase/client/TestAdmin.java c9b78b9 = > src/test/java/org/apache/hadoop/hbase/master/TestSnapshot.java PRE-CREA= TION = > src/test/java/org/apache/hadoop/hbase/master/TestSnapshotFailure.java P= RE-CREATION = > src/test/java/org/apache/hadoop/hbase/regionserver/TestCompaction.java = 34b8044 = > src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java 98b= d3e5 = > src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegionSnapshot.= java PRE-CREATION = > src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java 3= 8ef520 = > src/test/java/org/apache/hadoop/hbase/regionserver/TestZKSnapshotWatche= r.java PRE-CREATION = > = > Diff: http://review.cloudera.org/r/467/diff > = > = > Testing > ------- > = > Unit tests and integration tests with mini cluster passed. > = > = > Thanks, > = > Chongxin > = >