Return-Path: Delivered-To: apmail-hbase-dev-archive@www.apache.org Received: (qmail 28195 invoked from network); 30 May 2010 21:13:56 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 30 May 2010 21:13:56 -0000 Received: (qmail 29026 invoked by uid 500); 30 May 2010 21:13:56 -0000 Delivered-To: apmail-hbase-dev-archive@hbase.apache.org Received: (qmail 28987 invoked by uid 500); 30 May 2010 21:13:56 -0000 Mailing-List: contact dev-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hbase.apache.org Delivered-To: mailing list dev@hbase.apache.org Received: (qmail 28979 invoked by uid 99); 30 May 2010 21:13:56 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 30 May 2010 21:13:56 +0000 X-ASF-Spam-Status: No, hits=1.8 required=10.0 tests=AWL,FH_HELO_EQ_D_D_D_D,MIME_QP_LONG_LINE,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [75.101.130.251] (HELO ip-10-250-10-165.ec2.internal) (75.101.130.251) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 30 May 2010 21:13:51 +0000 Received: from ip-10-250-10-165.ec2.internal (localhost.localdomain [127.0.0.1]) by ip-10-250-10-165.ec2.internal (Postfix) with ESMTP id F09BC20B7A; Sun, 30 May 2010 21:13:29 +0000 (UTC) Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Subject: Re: Review Request: HBASE-1923. Bulk load into existing tables From: "Todd Lipcon" To: "Jonathan Gray" , stack@duboce.net Date: Sun, 30 May 2010 21:13:29 -0000 Message-ID: <20100530211329.6350.74754@ip-10-250-10-165.ec2.internal> Cc: jiraposter@review.hbase.org, "Todd Lipcon" , dev@hbase.apache.org In-Reply-To: <20100529001922.21168.29129@ip-10-250-10-165.ec2.internal> References: <20100529001922.21168.29129@ip-10-250-10-165.ec2.internal> > On 2010-05-28 17:19:22, stack wrote: > > src/main/java/org/apache/hadoop/hbase/ipc/HRegionInterface.java, line 2= 64 > > > > > > Is this patch for trunk or branch? If branch, this addition might = break RPC. It might be ok on the end here but we should test. I will be backporting to branch for our customer, but as a major new featur= e I didn't expect it would be committed to Apache branch. If people want it= , though, I'll put the patch up. In general, adding a new RPC doesn't cause incompatibility - if a client ca= lls it and it's not there on the server side, the client will just get an e= xception that the method doesn't exist. (eg see my backport of HDFS-630 to = hadoop 20) > On 2010-05-28 17:19:22, stack wrote: > > src/main/java/org/apache/hadoop/hbase/mapreduce/HFileOutputFormat.java,= line 181 > > > > > > Man, we never enable asserts.... throw an exception! > > Switched to using google Preconditions > On 2010-05-28 17:19:22, stack wrote: > > src/main/java/org/apache/hadoop/hbase/mapreduce/HFileOutputFormat.java,= line 187 > > > > > > Just asking, the comparator for sure is going to do the right thing= here? yea, ImmutableBytesWritable implements Comparable > On 2010-05-28 17:19:22, stack wrote: > > src/main/java/org/apache/hadoop/hbase/mapreduce/HFileOutputFormat.java,= line 213 > > > > > > Won't javadoc mangle your nice formatting here? Use ul/li or wrap = w/
?

good call, fixed to ul/li


> On 2010-05-28 17:19:22, stack wrote:
> > src/main/java/org/apache/hadoop/hbase/mapreduce/HFileOutputFormat.java,=
 line 234
> > 
> >
> >     Why two types?  Is this legacy?  KV has advantage of being able to =
carry a Delete

yea, it's for compatibility with TableOutputFormat which can take either. K=
ind of a pain, maybe we should just get rid of it and only accept KV?


> On 2010-05-28 17:19:22, stack wrote:
> > src/main/java/org/apache/hadoop/hbase/mapreduce/ImportTsv.java, line 73
> > 
> >
> >     In KV, there is a parseColumns function that might help here

doesn't seem much more convenient (since I already have String here)
One question, though: if I want to insert into a family that has no qualifi=
ers, should I be using EMPTY_BYTE_ARRAY when I construct the KV, or passing=
 null?


> On 2010-05-28 17:19:22, stack wrote:
> > src/main/java/org/apache/hadoop/hbase/mapreduce/ImportTsv.java, line 96
> > 
> >
> >     Is this safe?  Is there escaping you should be handling here?

Plan to address this in docs for importtsv - it's not a good TSV parser tha=
t supports quoting, etc.


> On 2010-05-28 17:19:22, stack wrote:
> > src/main/java/org/apache/hadoop/hbase/mapreduce/LoadIncrementalHFiles.j=
ava, line 229
> > 
> >
> >     I suppose in rare circumstances, it could split in here and mess yo=
u up.  You need a no-split flag per region out in zk or something?  We'd ne=
ed such a thing for snapshotting methinks.  But, thats for a later...

if there's been a split, the new daughter region will detect that the HFile=
 doesn't "fit" and throw WrongRegionException. This then triggers the retry=
 on ServerCallable, which fetches the new region info from meta, and realiz=
es it has to split the hfile before trying again.


> On 2010-05-28 17:19:22, stack wrote:
> > src/main/java/org/apache/hadoop/hbase/mapreduce/PutSortReducer.java, li=
ne 29
> > 
> >
> >     Class comment.  Whats this thing do?

fixed - copied from KeyValueSortReducer


> On 2010-05-28 17:19:22, stack wrote:
> > src/main/java/org/apache/hadoop/hbase/mapreduce/hadoopbackport/InputSam=
pler.java, line 57
> > 
> >
> >     We need this even though you wrote out partition edges earlier when=
 you read region boundaries?

You're right that it's unused, but I figured I'd put it in for convenience =
- when doing a bulk load into a new table, for example, you may want to use=
 this in order to figure out where to set region splits.


> On 2010-05-28 17:19:22, stack wrote:
> > src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java, line 1=
897
> > 
> >
> >     This is old cruddy stuff you are asking about.  It looks like I add=
ed splitsAndClosesLock to stop either happening during critical periods (HB=
ASE-588) and the splits lock looks now like a half measure from way back in=
 2007 -- HBASE-217.  Lets make an issue to clean it up.

Which one should I be taking? Do I need both? Which order?


> On 2010-05-28 17:19:22, stack wrote:
> > src/main/java/org/apache/hadoop/hbase/regionserver/Store.java, line 511
> > 
> >
> >     There is FSUtils in hbase util

On second thought actually this should be somewhat store-specific, so took =
out the TODO


> On 2010-05-28 17:19:22, stack wrote:
> > src/main/java/org/apache/hadoop/hbase/util/Bytes.java, line 1105
> > 
> >
> >     You were going to remove this

fixed


> On 2010-05-28 17:19:22, stack wrote:
> > src/main/java/org/apache/hadoop/hbase/regionserver/Store.java, line 535
> > 
> >
> >     What you going to do here?  Get the last key from the map of storef=
iles in current region and make your file be one less than it?  Your timest=
amp is current though so edits in here could overshadow those in another st=
orefile.  Thats ok?

going to make storefiles be a list instead of map... new review coming in a=
 bit


> On 2010-05-28 17:19:22, stack wrote:
> > src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java, line 646
> > 
> >
> >     Whats this about?

added comment - idea is to restore back to local-mode job runner when clust=
er shuts down


> On 2010-05-28 17:19:22, stack wrote:
> > src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java, line 784
> > 
> >
> >     Its not removing it itself?  Its registered to cleanup on exit.

was seeing cases where in between different tests in the same test case, it=
 wouldn't get cleaned up, etc.


> On 2010-05-28 17:19:22, stack wrote:
> > src/test/java/org/apache/hadoop/hbase/mapreduce/NMapInputFormat.java, l=
ine 40
> > 
> >
> >     How?  If nullwritables, don't I just get nulls?

Changed to:

/**
 * Input format that creates as many map tasks as configured in
 * mapred.map.tasks, each provided with a single row of
 * NullWritables. This can be useful when trying to write mappers
 * which don't have any real input (eg when the mapper is simply
 * producing random data as output)
 */


- Todd


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://review.hbase.org/r/87/#review96
-----------------------------------------------------------


On 2010-05-25 14:21:16, Todd Lipcon wrote:
> =

> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> http://review.hbase.org/r/87/
> -----------------------------------------------------------
> =

> (Updated 2010-05-25 14:21:16)
> =

> =

> Review request for hbase, stack and Jonathan Gray.
> =

> =

> Summary
> -------
> =

> Here's a first patch that implements bulk import into existing tables. Th=
is applies on top of HBASE-2586 and HBASE-2588 - I've pushed the series of =
the three to my github: http://github.com/toddlipcon/hbase/tree/hfof-review
> =

> I have some TODOs left that I want to take care of before this gets commi=
tted, but since it's a pretty large patch, I figured I'd get it out for rev=
iew ASAP.
> =

> The stuff in the hadoopbackport package is essentially copypaste from Had=
oop trunk, so you can ignore that in the review.
> =

> =

> This addresses bug HBASE-1923.
>     http://issues.apache.org/jira/browse/HBASE-1923
> =

> =

> Diffs
> -----
> =

>   pom.xml 0a009cf =

>   src/main/java/org/apache/hadoop/hbase/HRegionInfo.java 29b0cd6 =

>   src/main/java/org/apache/hadoop/hbase/io/ImmutableBytesWritable.java 0a=
9ec4b =

>   src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java cf4768f =

>   src/main/java/org/apache/hadoop/hbase/ipc/HRegionInterface.java 4cbe52a =

>   src/main/java/org/apache/hadoop/hbase/mapreduce/Driver.java 3d40695 =

>   src/main/java/org/apache/hadoop/hbase/mapreduce/HFileOutputFormat.java =
9c8e53e =

>   src/main/java/org/apache/hadoop/hbase/mapreduce/ImportTsv.java PRE-CREA=
TION =

>   src/main/java/org/apache/hadoop/hbase/mapreduce/LoadIncrementalHFiles.j=
ava PRE-CREATION =

>   src/main/java/org/apache/hadoop/hbase/mapreduce/PutSortReducer.java PRE=
-CREATION =

>   src/main/java/org/apache/hadoop/hbase/mapreduce/SimpleTotalOrderPartiti=
oner.java af3d588 =

>   src/main/java/org/apache/hadoop/hbase/mapreduce/hadoopbackport/InputSam=
pler.java PRE-CREATION =

>   src/main/java/org/apache/hadoop/hbase/mapreduce/hadoopbackport/TotalOrd=
erPartitioner.java PRE-CREATION =

>   src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java 287cd48 =

>   src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java f=
b65ed1 =

>   src/main/java/org/apache/hadoop/hbase/regionserver/Store.java 7de766d =

>   src/main/java/org/apache/hadoop/hbase/util/Bytes.java a53dafe =

>   src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java ed8709f =

>   src/test/java/org/apache/hadoop/hbase/mapreduce/NMapInputFormat.java PR=
E-CREATION =

>   src/test/java/org/apache/hadoop/hbase/mapreduce/TestHFileOutputFormat.j=
ava d04ced2 =

>   src/test/java/org/apache/hadoop/hbase/mapreduce/TestLoadIncrementalHFil=
es.java PRE-CREATION =

>   src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegionInfo.java=
 fcb22fb =

> =

> Diff: http://review.hbase.org/r/87/diff
> =

> =

> Testing
> -------
> =

> Primary unit/functional testing, a bit of pseudo-distributed testing. Pla=
n on doing full system tests before commit as well.
> =

> =

> Thanks,
> =

> Todd
> =

>