hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From st...@duboce.net
Subject Re: Review Request: HBASE-1923. Bulk load into existing tables
Date Mon, 31 May 2010 16:12:33 GMT


> On 2010-05-28 17:19:22, stack wrote:
> > src/main/java/org/apache/hadoop/hbase/mapreduce/HFileOutputFormat.java, line 234
> > <http://review.hbase.org/r/87/diff/1/?file=612#file612line234>
> >
> >     Why two types?  Is this legacy?  KV has advantage of being able to carry a Delete
> 
> Todd Lipcon wrote:
>     yea, it's for compatibility with TableOutputFormat which can take either. Kind of
a pain, maybe we should just get rid of it and only accept KV?

Doing both is a pain yes.  If Delete and Put had a common ancestor/Interface, we could use
that but it ain't there yet.  File an issue to undo Put option?


> On 2010-05-28 17:19:22, stack wrote:
> > src/main/java/org/apache/hadoop/hbase/mapreduce/ImportTsv.java, line 73
> > <http://review.hbase.org/r/87/diff/1/?file=613#file613line73>
> >
> >     In KV, there is a parseColumns function that might help here
> 
> Todd Lipcon wrote:
>     doesn't seem much more convenient (since I already have String here)
>     One question, though: if I want to insert into a family that has no qualifiers, should
I be using EMPTY_BYTE_ARRAY when I construct the KV, or passing null?

It looks like either works.  It looks like nulls are handled properly.


> On 2010-05-28 17:19:22, stack wrote:
> > src/main/java/org/apache/hadoop/hbase/mapreduce/ImportTsv.java, line 96
> > <http://review.hbase.org/r/87/diff/1/?file=613#file613line96>
> >
> >     Is this safe?  Is there escaping you should be handling here?
> 
> Todd Lipcon wrote:
>     Plan to address this in docs for importtsv - it's not a good TSV parser that supports
quoting, etc.

ok


> On 2010-05-28 17:19:22, stack wrote:
> > src/main/java/org/apache/hadoop/hbase/mapreduce/LoadIncrementalHFiles.java, line
229
> > <http://review.hbase.org/r/87/diff/1/?file=614#file614line229>
> >
> >     I suppose in rare circumstances, it could split in here and mess you up.  You
need a no-split flag per region out in zk or something?  We'd need such a thing for snapshotting
methinks.  But, thats for a later...
> 
> Todd Lipcon wrote:
>     if there's been a split, the new daughter region will detect that the HFile doesn't
"fit" and throw WrongRegionException. This then triggers the retry on ServerCallable, which
fetches the new region info from meta, and realizes it has to split the hfile before trying
again.

Nice.


> On 2010-05-28 17:19:22, stack wrote:
> > src/main/java/org/apache/hadoop/hbase/mapreduce/hadoopbackport/InputSampler.java,
line 57
> > <http://review.hbase.org/r/87/diff/1/?file=617#file617line57>
> >
> >     We need this even though you wrote out partition edges earlier when you read
region boundaries?
> 
> Todd Lipcon wrote:
>     You're right that it's unused, but I figured I'd put it in for convenience - when
doing a bulk load into a new table, for example, you may want to use this in order to figure
out where to set region splits.

Ok.  Sounds good.


> On 2010-05-28 17:19:22, stack wrote:
> > src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java, line 1897
> > <http://review.hbase.org/r/87/diff/1/?file=619#file619line1897>
> >
> >     This is old cruddy stuff you are asking about.  It looks like I added splitsAndClosesLock
to stop either happening during critical periods (HBASE-588) and the splits lock looks now
like a half measure from way back in 2007 -- HBASE-217.  Lets make an issue to clean it up.
> 
> Todd Lipcon wrote:
>     Which one should I be taking? Do I need both? Which order?

Seems like it depends on what you need.  I like your issue on rethinking these locks.  Sounds
like we might need to do stuff like break apart the splittingAndClose lock into a splitting
and closing.


- stack


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://review.hbase.org/r/87/#review96
-----------------------------------------------------------


On 2010-05-31 00:22:24, Todd Lipcon wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> http://review.hbase.org/r/87/
> -----------------------------------------------------------
> 
> (Updated 2010-05-31 00:22:24)
> 
> 
> Review request for hbase, stack and Jonathan Gray.
> 
> 
> Summary
> -------
> 
> Here's a first patch that implements bulk import into existing tables. This applies on
top of HBASE-2586 and HBASE-2588 - I've pushed the series of the three to my github: http://github.com/toddlipcon/hbase/tree/hfof-review
> 
> I have some TODOs left that I want to take care of before this gets committed, but since
it's a pretty large patch, I figured I'd get it out for review ASAP.
> 
> The stuff in the hadoopbackport package is essentially copypaste from Hadoop trunk, so
you can ignore that in the review.
> 
> 
> This addresses bug HBASE-1923.
>     http://issues.apache.org/jira/browse/HBASE-1923
> 
> 
> Diffs
> -----
> 
>   pom.xml 0a009cf 
>   src/main/java/org/apache/hadoop/hbase/HRegionInfo.java 29b0cd6 
>   src/main/java/org/apache/hadoop/hbase/io/ImmutableBytesWritable.java 0a9ec4b 
>   src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java b912a85 
>   src/main/java/org/apache/hadoop/hbase/ipc/HRegionInterface.java 4cbe52a 
>   src/main/java/org/apache/hadoop/hbase/mapreduce/Driver.java 3d40695 
>   src/main/java/org/apache/hadoop/hbase/mapreduce/HFileOutputFormat.java 9c8e53e 
>   src/main/java/org/apache/hadoop/hbase/mapreduce/ImportTsv.java PRE-CREATION 
>   src/main/java/org/apache/hadoop/hbase/mapreduce/LoadIncrementalHFiles.java PRE-CREATION

>   src/main/java/org/apache/hadoop/hbase/mapreduce/PutSortReducer.java PRE-CREATION 
>   src/main/java/org/apache/hadoop/hbase/mapreduce/SimpleTotalOrderPartitioner.java af3d588

>   src/main/java/org/apache/hadoop/hbase/mapreduce/TableMapReduceUtil.java b332280 
>   src/main/java/org/apache/hadoop/hbase/mapreduce/hadoopbackport/InputSampler.java PRE-CREATION

>   src/main/java/org/apache/hadoop/hbase/mapreduce/hadoopbackport/TotalOrderPartitioner.java
PRE-CREATION 
>   src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java 287cd48 
>   src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java 78f3223 
>   src/main/java/org/apache/hadoop/hbase/regionserver/Store.java 7de766d 
>   src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java 80bf09a 
>   src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java c8941f1 
>   src/main/java/org/apache/hadoop/hbase/util/Bytes.java a53dafe 
>   src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java ed8709f 
>   src/test/java/org/apache/hadoop/hbase/PerformanceEvaluation.java d76c75e 
>   src/test/java/org/apache/hadoop/hbase/mapreduce/NMapInputFormat.java PRE-CREATION 
>   src/test/java/org/apache/hadoop/hbase/mapreduce/TestHFileOutputFormat.java d04ced2

>   src/test/java/org/apache/hadoop/hbase/mapreduce/TestLoadIncrementalHFiles.java PRE-CREATION

>   src/test/java/org/apache/hadoop/hbase/regionserver/TestCompaction.java f1566d3 
>   src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegionInfo.java fcb22fb 
>   src/test/java/org/apache/hadoop/hbase/regionserver/TestStore.java 4595e6e 
>   src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java 2e4c7df 
>   src/test/java/org/apache/hadoop/hbase/util/TestBytes.java c7361cb 
> 
> Diff: http://review.hbase.org/r/87/diff
> 
> 
> Testing
> -------
> 
> Primary unit/functional testing, a bit of pseudo-distributed testing. Plan on doing full
system tests before commit as well.
> 
> 
> Thanks,
> 
> Todd
> 
>


Mime
View raw message