Return-Path: Delivered-To: apmail-hbase-dev-archive@www.apache.org Received: (qmail 17493 invoked from network); 31 May 2010 16:13:02 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 31 May 2010 16:13:02 -0000 Received: (qmail 8940 invoked by uid 500); 31 May 2010 16:13:01 -0000 Delivered-To: apmail-hbase-dev-archive@hbase.apache.org Received: (qmail 8912 invoked by uid 500); 31 May 2010 16:13:01 -0000 Mailing-List: contact dev-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hbase.apache.org Delivered-To: mailing list dev@hbase.apache.org Received: (qmail 8904 invoked by uid 99); 31 May 2010 16:13:00 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 31 May 2010 16:13:00 +0000 X-ASF-Spam-Status: No, hits=1.8 required=10.0 tests=FH_HELO_EQ_D_D_D_D,MIME_QP_LONG_LINE,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: 75.101.130.251 is neither permitted nor denied by domain of stack@duboce.net) Received: from [75.101.130.251] (HELO ip-10-250-10-165.ec2.internal) (75.101.130.251) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 31 May 2010 16:12:54 +0000 Received: from ip-10-250-10-165.ec2.internal (localhost.localdomain [127.0.0.1]) by ip-10-250-10-165.ec2.internal (Postfix) with ESMTP id AB8F020B7A; Mon, 31 May 2010 16:12:33 +0000 (UTC) Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Subject: Re: Review Request: HBASE-1923. Bulk load into existing tables From: stack@duboce.net To: "Jonathan Gray" , stack@duboce.net Date: Mon, 31 May 2010 16:12:33 -0000 Message-ID: <20100531161233.30347.98132@ip-10-250-10-165.ec2.internal> Cc: jiraposter@review.hbase.org, "Todd Lipcon" , dev@hbase.apache.org In-Reply-To: <20100529001922.21168.29129@ip-10-250-10-165.ec2.internal> References: <20100529001922.21168.29129@ip-10-250-10-165.ec2.internal> X-Virus-Checked: Checked by ClamAV on apache.org > On 2010-05-28 17:19:22, stack wrote: > > src/main/java/org/apache/hadoop/hbase/mapreduce/HFileOutputFormat.java,= line 234 > > > > > > Why two types? Is this legacy? KV has advantage of being able to = carry a Delete > = > Todd Lipcon wrote: > yea, it's for compatibility with TableOutputFormat which can take eit= her. Kind of a pain, maybe we should just get rid of it and only accept KV? Doing both is a pain yes. If Delete and Put had a common ancestor/Interfac= e, we could use that but it ain't there yet. File an issue to undo Put opt= ion? > On 2010-05-28 17:19:22, stack wrote: > > src/main/java/org/apache/hadoop/hbase/mapreduce/ImportTsv.java, line 73 > > > > > > In KV, there is a parseColumns function that might help here > = > Todd Lipcon wrote: > doesn't seem much more convenient (since I already have String here) > One question, though: if I want to insert into a family that has no q= ualifiers, should I be using EMPTY_BYTE_ARRAY when I construct the KV, or p= assing null? It looks like either works. It looks like nulls are handled properly. > On 2010-05-28 17:19:22, stack wrote: > > src/main/java/org/apache/hadoop/hbase/mapreduce/ImportTsv.java, line 96 > > > > > > Is this safe? Is there escaping you should be handling here? > = > Todd Lipcon wrote: > Plan to address this in docs for importtsv - it's not a good TSV pars= er that supports quoting, etc. ok > On 2010-05-28 17:19:22, stack wrote: > > src/main/java/org/apache/hadoop/hbase/mapreduce/LoadIncrementalHFiles.j= ava, line 229 > > > > > > I suppose in rare circumstances, it could split in here and mess yo= u up. You need a no-split flag per region out in zk or something? We'd ne= ed such a thing for snapshotting methinks. But, thats for a later... > = > Todd Lipcon wrote: > if there's been a split, the new daughter region will detect that the= HFile doesn't "fit" and throw WrongRegionException. This then triggers the= retry on ServerCallable, which fetches the new region info from meta, and = realizes it has to split the hfile before trying again. Nice. > On 2010-05-28 17:19:22, stack wrote: > > src/main/java/org/apache/hadoop/hbase/mapreduce/hadoopbackport/InputSam= pler.java, line 57 > > > > > > We need this even though you wrote out partition edges earlier when= you read region boundaries? > = > Todd Lipcon wrote: > You're right that it's unused, but I figured I'd put it in for conven= ience - when doing a bulk load into a new table, for example, you may want = to use this in order to figure out where to set region splits. Ok. Sounds good. > On 2010-05-28 17:19:22, stack wrote: > > src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java, line 1= 897 > > > > > > This is old cruddy stuff you are asking about. It looks like I add= ed splitsAndClosesLock to stop either happening during critical periods (HB= ASE-588) and the splits lock looks now like a half measure from way back in= 2007 -- HBASE-217. Lets make an issue to clean it up. > = > Todd Lipcon wrote: > Which one should I be taking? Do I need both? Which order? Seems like it depends on what you need. I like your issue on rethinking th= ese locks. Sounds like we might need to do stuff like break apart the spli= ttingAndClose lock into a splitting and closing. - stack ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: http://review.hbase.org/r/87/#review96 ----------------------------------------------------------- On 2010-05-31 00:22:24, Todd Lipcon wrote: > = > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > http://review.hbase.org/r/87/ > ----------------------------------------------------------- > = > (Updated 2010-05-31 00:22:24) > = > = > Review request for hbase, stack and Jonathan Gray. > = > = > Summary > ------- > = > Here's a first patch that implements bulk import into existing tables. Th= is applies on top of HBASE-2586 and HBASE-2588 - I've pushed the series of = the three to my github: http://github.com/toddlipcon/hbase/tree/hfof-review > = > I have some TODOs left that I want to take care of before this gets commi= tted, but since it's a pretty large patch, I figured I'd get it out for rev= iew ASAP. > = > The stuff in the hadoopbackport package is essentially copypaste from Had= oop trunk, so you can ignore that in the review. > = > = > This addresses bug HBASE-1923. > http://issues.apache.org/jira/browse/HBASE-1923 > = > = > Diffs > ----- > = > pom.xml 0a009cf = > src/main/java/org/apache/hadoop/hbase/HRegionInfo.java 29b0cd6 = > src/main/java/org/apache/hadoop/hbase/io/ImmutableBytesWritable.java 0a= 9ec4b = > src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java b912a85 = > src/main/java/org/apache/hadoop/hbase/ipc/HRegionInterface.java 4cbe52a = > src/main/java/org/apache/hadoop/hbase/mapreduce/Driver.java 3d40695 = > src/main/java/org/apache/hadoop/hbase/mapreduce/HFileOutputFormat.java = 9c8e53e = > src/main/java/org/apache/hadoop/hbase/mapreduce/ImportTsv.java PRE-CREA= TION = > src/main/java/org/apache/hadoop/hbase/mapreduce/LoadIncrementalHFiles.j= ava PRE-CREATION = > src/main/java/org/apache/hadoop/hbase/mapreduce/PutSortReducer.java PRE= -CREATION = > src/main/java/org/apache/hadoop/hbase/mapreduce/SimpleTotalOrderPartiti= oner.java af3d588 = > src/main/java/org/apache/hadoop/hbase/mapreduce/TableMapReduceUtil.java= b332280 = > src/main/java/org/apache/hadoop/hbase/mapreduce/hadoopbackport/InputSam= pler.java PRE-CREATION = > src/main/java/org/apache/hadoop/hbase/mapreduce/hadoopbackport/TotalOrd= erPartitioner.java PRE-CREATION = > src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java 287cd48 = > src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java 7= 8f3223 = > src/main/java/org/apache/hadoop/hbase/regionserver/Store.java 7de766d = > src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java 80bf0= 9a = > src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java c8= 941f1 = > src/main/java/org/apache/hadoop/hbase/util/Bytes.java a53dafe = > src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java ed8709f = > src/test/java/org/apache/hadoop/hbase/PerformanceEvaluation.java d76c75= e = > src/test/java/org/apache/hadoop/hbase/mapreduce/NMapInputFormat.java PR= E-CREATION = > src/test/java/org/apache/hadoop/hbase/mapreduce/TestHFileOutputFormat.j= ava d04ced2 = > src/test/java/org/apache/hadoop/hbase/mapreduce/TestLoadIncrementalHFil= es.java PRE-CREATION = > src/test/java/org/apache/hadoop/hbase/regionserver/TestCompaction.java = f1566d3 = > src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegionInfo.java= fcb22fb = > src/test/java/org/apache/hadoop/hbase/regionserver/TestStore.java 4595e= 6e = > src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java 2= e4c7df = > src/test/java/org/apache/hadoop/hbase/util/TestBytes.java c7361cb = > = > Diff: http://review.hbase.org/r/87/diff > = > = > Testing > ------- > = > Primary unit/functional testing, a bit of pseudo-distributed testing. Pla= n on doing full system tests before commit as well. > = > = > Thanks, > = > Todd > = >