hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Todd Lipcon" <t...@cloudera.com>
Subject Re: Review Request: HBASE-1923. Bulk load into existing tables
Date Sun, 30 May 2010 21:13:29 GMT


> On 2010-05-28 17:19:22, stack wrote:
> > src/main/java/org/apache/hadoop/hbase/ipc/HRegionInterface.java, line 264
> > <http://review.hbase.org/r/87/diff/1/?file=610#file610line264>
> >
> >     Is this patch for trunk or branch?  If branch, this addition might break RPC.
  It might be ok on the end here but we should test.

I will be backporting to branch for our customer, but as a major new feature I didn't expect
it would be committed to Apache branch. If people want it, though, I'll put the patch up.

In general, adding a new RPC doesn't cause incompatibility - if a client calls it and it's
not there on the server side, the client will just get an exception that the method doesn't
exist. (eg see my backport of HDFS-630 to hadoop 20)


> On 2010-05-28 17:19:22, stack wrote:
> > src/main/java/org/apache/hadoop/hbase/mapreduce/HFileOutputFormat.java, line 181
> > <http://review.hbase.org/r/87/diff/1/?file=612#file612line181>
> >
> >     Man, we never enable asserts.... throw an exception!
> >

Switched to using google Preconditions


> On 2010-05-28 17:19:22, stack wrote:
> > src/main/java/org/apache/hadoop/hbase/mapreduce/HFileOutputFormat.java, line 187
> > <http://review.hbase.org/r/87/diff/1/?file=612#file612line187>
> >
> >     Just asking, the comparator for sure is going to do the right thing here?

yea, ImmutableBytesWritable implements Comparable


> On 2010-05-28 17:19:22, stack wrote:
> > src/main/java/org/apache/hadoop/hbase/mapreduce/HFileOutputFormat.java, line 213
> > <http://review.hbase.org/r/87/diff/1/?file=612#file612line213>
> >
> >     Won't javadoc mangle your nice formatting here?  Use ul/li or wrap w/ <pre>?

good call, fixed to ul/li


> On 2010-05-28 17:19:22, stack wrote:
> > src/main/java/org/apache/hadoop/hbase/mapreduce/HFileOutputFormat.java, line 234
> > <http://review.hbase.org/r/87/diff/1/?file=612#file612line234>
> >
> >     Why two types?  Is this legacy?  KV has advantage of being able to carry a Delete

yea, it's for compatibility with TableOutputFormat which can take either. Kind of a pain,
maybe we should just get rid of it and only accept KV?


> On 2010-05-28 17:19:22, stack wrote:
> > src/main/java/org/apache/hadoop/hbase/mapreduce/ImportTsv.java, line 73
> > <http://review.hbase.org/r/87/diff/1/?file=613#file613line73>
> >
> >     In KV, there is a parseColumns function that might help here

doesn't seem much more convenient (since I already have String here)
One question, though: if I want to insert into a family that has no qualifiers, should I be
using EMPTY_BYTE_ARRAY when I construct the KV, or passing null?


> On 2010-05-28 17:19:22, stack wrote:
> > src/main/java/org/apache/hadoop/hbase/mapreduce/ImportTsv.java, line 96
> > <http://review.hbase.org/r/87/diff/1/?file=613#file613line96>
> >
> >     Is this safe?  Is there escaping you should be handling here?

Plan to address this in docs for importtsv - it's not a good TSV parser that supports quoting,
etc.


> On 2010-05-28 17:19:22, stack wrote:
> > src/main/java/org/apache/hadoop/hbase/mapreduce/LoadIncrementalHFiles.java, line
229
> > <http://review.hbase.org/r/87/diff/1/?file=614#file614line229>
> >
> >     I suppose in rare circumstances, it could split in here and mess you up.  You
need a no-split flag per region out in zk or something?  We'd need such a thing for snapshotting
methinks.  But, thats for a later...

if there's been a split, the new daughter region will detect that the HFile doesn't "fit"
and throw WrongRegionException. This then triggers the retry on ServerCallable, which fetches
the new region info from meta, and realizes it has to split the hfile before trying again.


> On 2010-05-28 17:19:22, stack wrote:
> > src/main/java/org/apache/hadoop/hbase/mapreduce/PutSortReducer.java, line 29
> > <http://review.hbase.org/r/87/diff/1/?file=615#file615line29>
> >
> >     Class comment.  Whats this thing do?

fixed - copied from KeyValueSortReducer


> On 2010-05-28 17:19:22, stack wrote:
> > src/main/java/org/apache/hadoop/hbase/mapreduce/hadoopbackport/InputSampler.java,
line 57
> > <http://review.hbase.org/r/87/diff/1/?file=617#file617line57>
> >
> >     We need this even though you wrote out partition edges earlier when you read
region boundaries?

You're right that it's unused, but I figured I'd put it in for convenience - when doing a
bulk load into a new table, for example, you may want to use this in order to figure out where
to set region splits.


> On 2010-05-28 17:19:22, stack wrote:
> > src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java, line 1897
> > <http://review.hbase.org/r/87/diff/1/?file=619#file619line1897>
> >
> >     This is old cruddy stuff you are asking about.  It looks like I added splitsAndClosesLock
to stop either happening during critical periods (HBASE-588) and the splits lock looks now
like a half measure from way back in 2007 -- HBASE-217.  Lets make an issue to clean it up.

Which one should I be taking? Do I need both? Which order?


> On 2010-05-28 17:19:22, stack wrote:
> > src/main/java/org/apache/hadoop/hbase/regionserver/Store.java, line 511
> > <http://review.hbase.org/r/87/diff/1/?file=621#file621line511>
> >
> >     There is FSUtils in hbase util

On second thought actually this should be somewhat store-specific, so took out the TODO


> On 2010-05-28 17:19:22, stack wrote:
> > src/main/java/org/apache/hadoop/hbase/util/Bytes.java, line 1105
> > <http://review.hbase.org/r/87/diff/1/?file=622#file622line1105>
> >
> >     You were going to remove this

fixed


> On 2010-05-28 17:19:22, stack wrote:
> > src/main/java/org/apache/hadoop/hbase/regionserver/Store.java, line 535
> > <http://review.hbase.org/r/87/diff/1/?file=621#file621line535>
> >
> >     What you going to do here?  Get the last key from the map of storefiles in current
region and make your file be one less than it?  Your timestamp is current though so edits
in here could overshadow those in another storefile.  Thats ok?

going to make storefiles be a list instead of map... new review coming in a bit


> On 2010-05-28 17:19:22, stack wrote:
> > src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java, line 646
> > <http://review.hbase.org/r/87/diff/1/?file=623#file623line646>
> >
> >     Whats this about?

added comment - idea is to restore back to local-mode job runner when cluster shuts down


> On 2010-05-28 17:19:22, stack wrote:
> > src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java, line 784
> > <http://review.hbase.org/r/87/diff/1/?file=623#file623line784>
> >
> >     Its not removing it itself?  Its registered to cleanup on exit.

was seeing cases where in between different tests in the same test case, it wouldn't get cleaned
up, etc.


> On 2010-05-28 17:19:22, stack wrote:
> > src/test/java/org/apache/hadoop/hbase/mapreduce/NMapInputFormat.java, line 40
> > <http://review.hbase.org/r/87/diff/1/?file=624#file624line40>
> >
> >     How?  If nullwritables, don't I just get nulls?

Changed to:

/**
 * Input format that creates as many map tasks as configured in
 * mapred.map.tasks, each provided with a single row of
 * NullWritables. This can be useful when trying to write mappers
 * which don't have any real input (eg when the mapper is simply
 * producing random data as output)
 */


- Todd


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://review.hbase.org/r/87/#review96
-----------------------------------------------------------


On 2010-05-25 14:21:16, Todd Lipcon wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> http://review.hbase.org/r/87/
> -----------------------------------------------------------
> 
> (Updated 2010-05-25 14:21:16)
> 
> 
> Review request for hbase, stack and Jonathan Gray.
> 
> 
> Summary
> -------
> 
> Here's a first patch that implements bulk import into existing tables. This applies on
top of HBASE-2586 and HBASE-2588 - I've pushed the series of the three to my github: http://github.com/toddlipcon/hbase/tree/hfof-review
> 
> I have some TODOs left that I want to take care of before this gets committed, but since
it's a pretty large patch, I figured I'd get it out for review ASAP.
> 
> The stuff in the hadoopbackport package is essentially copypaste from Hadoop trunk, so
you can ignore that in the review.
> 
> 
> This addresses bug HBASE-1923.
>     http://issues.apache.org/jira/browse/HBASE-1923
> 
> 
> Diffs
> -----
> 
>   pom.xml 0a009cf 
>   src/main/java/org/apache/hadoop/hbase/HRegionInfo.java 29b0cd6 
>   src/main/java/org/apache/hadoop/hbase/io/ImmutableBytesWritable.java 0a9ec4b 
>   src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java cf4768f 
>   src/main/java/org/apache/hadoop/hbase/ipc/HRegionInterface.java 4cbe52a 
>   src/main/java/org/apache/hadoop/hbase/mapreduce/Driver.java 3d40695 
>   src/main/java/org/apache/hadoop/hbase/mapreduce/HFileOutputFormat.java 9c8e53e 
>   src/main/java/org/apache/hadoop/hbase/mapreduce/ImportTsv.java PRE-CREATION 
>   src/main/java/org/apache/hadoop/hbase/mapreduce/LoadIncrementalHFiles.java PRE-CREATION

>   src/main/java/org/apache/hadoop/hbase/mapreduce/PutSortReducer.java PRE-CREATION 
>   src/main/java/org/apache/hadoop/hbase/mapreduce/SimpleTotalOrderPartitioner.java af3d588

>   src/main/java/org/apache/hadoop/hbase/mapreduce/hadoopbackport/InputSampler.java PRE-CREATION

>   src/main/java/org/apache/hadoop/hbase/mapreduce/hadoopbackport/TotalOrderPartitioner.java
PRE-CREATION 
>   src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java 287cd48 
>   src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java fb65ed1 
>   src/main/java/org/apache/hadoop/hbase/regionserver/Store.java 7de766d 
>   src/main/java/org/apache/hadoop/hbase/util/Bytes.java a53dafe 
>   src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java ed8709f 
>   src/test/java/org/apache/hadoop/hbase/mapreduce/NMapInputFormat.java PRE-CREATION 
>   src/test/java/org/apache/hadoop/hbase/mapreduce/TestHFileOutputFormat.java d04ced2

>   src/test/java/org/apache/hadoop/hbase/mapreduce/TestLoadIncrementalHFiles.java PRE-CREATION

>   src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegionInfo.java fcb22fb 
> 
> Diff: http://review.hbase.org/r/87/diff
> 
> 
> Testing
> -------
> 
> Primary unit/functional testing, a bit of pseudo-distributed testing. Plan on doing full
system tests before commit as well.
> 
> 
> Thanks,
> 
> Todd
> 
>


Mime
View raw message