hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ryan Rawson <ryano...@gmail.com>
Subject Re: Review Request: HBASE-1923. Bulk load into existing tables
Date Tue, 01 Jun 2010 05:35:40 GMT
Lets not put this in branch... focus on making trunk releasable not
extending branch's life.

On May 30, 2010 2:13 PM, "Todd Lipcon" <todd@cloudera.com> wrote:
>
>
>> On 2010-05-28 17:19:22, stack wrote:
>> > src/main/java/org/apache/hadoop/hbase/ipc/HRegionInterface.java, line
264
>> > <http://review.hbase.org/r/87/diff/1/?file=610#file610line264>
>> >
>> > Is this patch for trunk or branch? If branch, this addition might break
RPC. It might be ok on the end here but we should test.
>
> I will be backporting to branch for our customer, but as a major new
feature I didn't expect it would be committed to Apache branch. If people
want it, though, I'll put the patch up.
>
> In general, adding a new RPC doesn't cause incompatibility - if a client
calls it and it's not there on the server side, the client will just get an
exception that the method doesn't exist. (eg see my backport of HDFS-630 to
hadoop 20)
>
>
>> On 2010-05-28 17:19:22, stack wrote:
>> > src/main/java/org/apache/hadoop/hbase/mapreduce/HFileOutputFormat.java,
line 181
>> > <http://review.hbase.org/r/87/diff/1/?file=612#file612line181>
>> >
>> > Man, we never enable asserts.... throw an exception!
>> >
>
> Switched to using google Preconditions
>
>
>> On 2010-05-28 17:19:22, stack wrote:
>> > src/main/java/org/apache/hadoop/hbase/mapreduce/HFileOutputFormat.java,
line 187
>> > <http://review.hbase.org/r/87/diff/1/?file=612#file612line187>
>> >
>> > Just asking, the comparator for sure is going to do the right thing
here?
>
> yea, ImmutableBytesWritable implements Comparable
>
>
>> On 2010-05-28 17:19:22, stack wrote:
>> > src/main/java/org/apache/hadoop/hbase/mapreduce/HFileOutputFormat.java,
line 213
>> > <http://review.hbase.org/r/87/diff/1/?file=612#file612line213>
>> >
>> > Won't javadoc mangle your nice formatting here? Use ul/li or wrap w/
<pre>?
>
> good call, fixed to ul/li
>
>
>> On 2010-05-28 17:19:22, stack wrote:
>> > src/main/java/org/apache/hadoop/hbase/mapreduce/HFileOutputFormat.java,
line 234
>> > <http://review.hbase.org/r/87/diff/1/?file=612#file612line234>
>> >
>> > Why two types? Is this legacy? KV has advantage of being able to carry
a Delete
>
> yea, it's for compatibility with TableOutputFormat which can take either.
Kind of a pain, maybe we should just get rid of it and only accept KV?
>
>
>> On 2010-05-28 17:19:22, stack wrote:
>> > src/main/java/org/apache/hadoop/hbase/mapreduce/ImportTsv.java, line 73
>> > <http://review.hbase.org/r/87/diff/1/?file=613#file613line73>
>> >
>> > In KV, there is a parseColumns function that might help here
>
> doesn't seem much more convenient (since I already have String here)
> One question, though: if I want to insert into a family that has no
qualifiers, should I be using EMPTY_BYTE_ARRAY when I construct the KV, or
passing null?
>
>
>> On 2010-05-28 17:19:22, stack wrote:
>> > src/main/java/org/apache/hadoop/hbase/mapreduce/ImportTsv.java, line 96
>> > <http://review.hbase.org/r/87/diff/1/?file=613#file613line96>
>> >
>> > Is this safe? Is there escaping you should be handling here?
>
> Plan to address this in docs for importtsv - it's not a good TSV parser
that supports quoting, etc.
>
>
>> On 2010-05-28 17:19:22, stack wrote:
>> >
src/main/java/org/apache/hadoop/hbase/mapreduce/LoadIncrementalHFiles.java,
line 229
>> > <http://review.hbase.org/r/87/diff/1/?file=614#file614line229>
>> >
>> > I suppose in rare circumstances, it could split in here and mess you
up. You need a no-split flag per region out in zk or something? We'd need
such a thing for snapshotting methinks. But, thats for a later...
>
> if there's been a split, the new daughter region will detect that the
HFile doesn't "fit" and throw WrongRegionException. This then triggers the
retry on ServerCallable, which fetches the new region info from meta, and
realizes it has to split the hfile before trying again.
>
>
>> On 2010-05-28 17:19:22, stack wrote:
>> > src/main/java/org/apache/hadoop/hbase/mapreduce/PutSortReducer.java,
line 29
>> > <http://review.hbase.org/r/87/diff/1/?file=615#file615line29>
>> >
>> > Class comment. Whats this thing do?
>
> fixed - copied from KeyValueSortReducer
>
>
>> On 2010-05-28 17:19:22, stack wrote:
>> >
src/main/java/org/apache/hadoop/hbase/mapreduce/hadoopbackport/InputSampler.java,
line 57
>> > <http://review.hbase.org/r/87/diff/1/?file=617#file617line57>
>> >
>> > We need this even though you wrote out partition edges earlier when you
read region boundaries?
>
> You're right that it's unused, but I figured I'd put it in for convenience
- when doing a bulk load into a new table, for example, you may want to use
this in order to figure out where to set region splits.
>
>
>> On 2010-05-28 17:19:22, stack wrote:
>> > src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java, line
1897
>> > <http://review.hbase.org/r/87/diff/1/?file=619#file619line1897>
>> >
>> > This is old cruddy stuff you are asking about. It looks like I added
splitsAndClosesLock to stop either happening during critical periods
(HBASE-588) and the splits lock looks now like a half measure from way back
in 2007 -- HBASE-217. Lets make an issue to clean it up.
>
> Which one should I be taking? Do I need both? Which order?
>
>
>> On 2010-05-28 17:19:22, stack wrote:
>> > src/main/java/org/apache/hadoop/hbase/regionserver/Store.java, line 511
>> > <http://review.hbase.org/r/87/diff/1/?file=621#file621line511>
>> >
>> > There is FSUtils in hbase util
>
> On second thought actually this should be somewhat store-specific, so took
out the TODO
>
>
>> On 2010-05-28 17:19:22, stack wrote:
>> > src/main/java/org/apache/hadoop/hbase/util/Bytes.java, line 1105
>> > <http://review.hbase.org/r/87/diff/1/?file=622#file622line1105>
>> >
>> > You were going to remove this
>
> fixed
>
>
>> On 2010-05-28 17:19:22, stack wrote:
>> > src/main/java/org/apache/hadoop/hbase/regionserver/Store.java, line 535
>> > <http://review.hbase.org/r/87/diff/1/?file=621#file621line535>
>> >
>> > What you going to do here? Get the last key from the map of storefiles
in current region and make your file be one less than it? Your timestamp is
current though so edits in here could overshadow those in another storefile.
Thats ok?
>
> going to make storefiles be a list instead of map... new review coming in
a bit
>
>
>> On 2010-05-28 17:19:22, stack wrote:
>> > src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java, line
646
>> > <http://review.hbase.org/r/87/diff/1/?file=623#file623line646>
>> >
>> > Whats this about?
>
> added comment - idea is to restore back to local-mode job runner when
cluster shuts down
>
>
>> On 2010-05-28 17:19:22, stack wrote:
>> > src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java, line
784
>> > <http://review.hbase.org/r/87/diff/1/?file=623#file623line784>
>> >
>> > Its not removing it itself? Its registered to cleanup on exit.
>
> was seeing cases where in between different tests in the same test case,
it wouldn't get cleaned up, etc.
>
>
>> On 2010-05-28 17:19:22, stack wrote:
>> > src/test/java/org/apache/hadoop/hbase/mapreduce/NMapInputFormat.java,
line 40
>> > <http://review.hbase.org/r/87/diff/1/?file=624#file624line40>
>> >
>> > How? If nullwritables, don't I just get nulls?
>
> Changed to:
>
> /**
> * Input format that creates as many map tasks as configured in
> * mapred.map.tasks, each provided with a single row of
> * NullWritables. This can be useful when trying to write mappers
> * which don't have any real input (eg when the mapper is simply
> * producing random data as output)
> */
>
>
> - Todd
>
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> http://review.hbase.org/r/87/#review96
> -----------------------------------------------------------
>
>
> On 2010-05-25 14:21:16, Todd Lipcon wrote:
>>
>> -----------------------------------------------------------
>> This is an automatically generated e-mail. To reply, visit:
>> http://review.hbase.org/r/87/
>> -----------------------------------------------------------
>>
>> (Updated 2010-05-25 14:21:16)
>>
>>
>> Review request for hbase, stack and Jonathan Gray.
>>
>>
>> Summary
>> -------
>>
>> Here's a first patch that implements bulk import into existing tables.
This applies on top of HBASE-2586 and HBASE-2588 - I've pushed the series of
the three to my github: http://github.com/toddlipcon/hbase/tree/hfof-review
>>
>> I have some TODOs left that I want to take care of before this gets
committed, but since it's a pretty large patch, I figured I'd get it out for
review ASAP.
>>
>> The stuff in the hadoopbackport package is essentially copypaste from
Hadoop trunk, so you can ignore that in the review.
>>
>>
>> This addresses bug HBASE-1923.
>> http://issues.apache.org/jira/browse/HBASE-1923
>>
>>
>> Diffs
>> -----
>>
>> pom.xml 0a009cf
>> src/main/java/org/apache/hadoop/hbase/HRegionInfo.java 29b0cd6
>> src/main/java/org/apache/hadoop/hbase/io/ImmutableBytesWritable.java
0a9ec4b
>> src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java cf4768f
>> src/main/java/org/apache/hadoop/hbase/ipc/HRegionInterface.java 4cbe52a
>> src/main/java/org/apache/hadoop/hbase/mapreduce/Driver.java 3d40695
>> src/main/java/org/apache/hadoop/hbase/mapreduce/HFileOutputFormat.java
9c8e53e
>> src/main/java/org/apache/hadoop/hbase/mapreduce/ImportTsv.java
PRE-CREATION
>>
src/main/java/org/apache/hadoop/hbase/mapreduce/LoadIncrementalHFiles.java
PRE-CREATION
>> src/main/java/org/apache/hadoop/hbase/mapreduce/PutSortReducer.java
PRE-CREATION
>>
src/main/java/org/apache/hadoop/hbase/mapreduce/SimpleTotalOrderPartitioner.java
af3d588
>>
src/main/java/org/apache/hadoop/hbase/mapreduce/hadoopbackport/InputSampler.java
PRE-CREATION
>>
src/main/java/org/apache/hadoop/hbase/mapreduce/hadoopbackport/TotalOrderPartitioner.java
PRE-CREATION
>> src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java 287cd48
>> src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
fb65ed1
>> src/main/java/org/apache/hadoop/hbase/regionserver/Store.java 7de766d
>> src/main/java/org/apache/hadoop/hbase/util/Bytes.java a53dafe
>> src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java ed8709f
>> src/test/java/org/apache/hadoop/hbase/mapreduce/NMapInputFormat.java
PRE-CREATION
>>
src/test/java/org/apache/hadoop/hbase/mapreduce/TestHFileOutputFormat.java
d04ced2
>>
src/test/java/org/apache/hadoop/hbase/mapreduce/TestLoadIncrementalHFiles.java
PRE-CREATION
>> src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegionInfo.java
fcb22fb
>>
>> Diff: http://review.hbase.org/r/87/diff
>>
>>
>> Testing
>> -------
>>
>> Primary unit/functional testing, a bit of pseudo-distributed testing.
Plan on doing full system tests before commit as well.
>>
>>
>> Thanks,
>>
>> Todd
>>
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message