hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Purtell <apurt...@yahoo.com>
Subject Re: Review Request: HBASE-1923. Bulk load into existing tables
Date Tue, 01 Jun 2010 05:52:14 GMT
> I will be backporting to branch for our customer

And as users of CDH3, Trend will want it and use it.

Where we can avoid fragmentation, we should.

As long as it does not break RPC I'm in favor of this going into 0.20 also.

   - Andy


> From: Ryan Rawson <ryanobjc@gmail.com>
> Subject: Re: Review Request: HBASE-1923. Bulk load into existing tables
> To: dev@hbase.apache.org
> Cc: jiraposter@review.hbase.org, "Todd Lipcon" <todd@cloudera.com>, stack@duboce.net,
"Jonathan Gray" <jgray@facebook.com>
> Date: Monday, May 31, 2010, 10:35 PM
> Lets not put this in branch... focus
> on making trunk releasable not
> extending branch's life.
> 
> On May 30, 2010 2:13 PM, "Todd Lipcon" <todd@cloudera.com>
> wrote:
> >
> >
> >> On 2010-05-28 17:19:22, stack wrote:
> >> >
> src/main/java/org/apache/hadoop/hbase/ipc/HRegionInterface.java,
> line
> 264
> >> > <http://review.hbase.org/r/87/diff/1/?file=610#file610line264>
> >> >
> >> > Is this patch for trunk or branch? If branch,
> this addition might break
> RPC. It might be ok on the end here but we should test.
> >
> > I will be backporting to branch for our customer, but
> as a major new
> feature I didn't expect it would be committed to Apache
> branch. If people
> want it, though, I'll put the patch up.
> >
> > In general, adding a new RPC doesn't cause
> incompatibility - if a client
> calls it and it's not there on the server side, the client
> will just get an
> exception that the method doesn't exist. (eg see my
> backport of HDFS-630 to
> hadoop 20)
> >
> >
> >> On 2010-05-28 17:19:22, stack wrote:
> >> >
> src/main/java/org/apache/hadoop/hbase/mapreduce/HFileOutputFormat.java,
> line 181
> >> > <http://review.hbase.org/r/87/diff/1/?file=612#file612line181>
> >> >
> >> > Man, we never enable asserts.... throw an
> exception!
> >> >
> >
> > Switched to using google Preconditions
> >
> >
> >> On 2010-05-28 17:19:22, stack wrote:
> >> >
> src/main/java/org/apache/hadoop/hbase/mapreduce/HFileOutputFormat.java,
> line 187
> >> > <http://review.hbase.org/r/87/diff/1/?file=612#file612line187>
> >> >
> >> > Just asking, the comparator for sure is going
> to do the right thing
> here?
> >
> > yea, ImmutableBytesWritable implements Comparable
> >
> >
> >> On 2010-05-28 17:19:22, stack wrote:
> >> >
> src/main/java/org/apache/hadoop/hbase/mapreduce/HFileOutputFormat.java,
> line 213
> >> > <http://review.hbase.org/r/87/diff/1/?file=612#file612line213>
> >> >
> >> > Won't javadoc mangle your nice formatting
> here? Use ul/li or wrap w/
> <pre>?
> >
> > good call, fixed to ul/li
> >
> >
> >> On 2010-05-28 17:19:22, stack wrote:
> >> >
> src/main/java/org/apache/hadoop/hbase/mapreduce/HFileOutputFormat.java,
> line 234
> >> > <http://review.hbase.org/r/87/diff/1/?file=612#file612line234>
> >> >
> >> > Why two types? Is this legacy? KV has
> advantage of being able to carry
> a Delete
> >
> > yea, it's for compatibility with TableOutputFormat
> which can take either.
> Kind of a pain, maybe we should just get rid of it and only
> accept KV?
> >
> >
> >> On 2010-05-28 17:19:22, stack wrote:
> >> >
> src/main/java/org/apache/hadoop/hbase/mapreduce/ImportTsv.java,
> line 73
> >> > <http://review.hbase.org/r/87/diff/1/?file=613#file613line73>
> >> >
> >> > In KV, there is a parseColumns function that
> might help here
> >
> > doesn't seem much more convenient (since I already
> have String here)
> > One question, though: if I want to insert into a
> family that has no
> qualifiers, should I be using EMPTY_BYTE_ARRAY when I
> construct the KV, or
> passing null?
> >
> >
> >> On 2010-05-28 17:19:22, stack wrote:
> >> >
> src/main/java/org/apache/hadoop/hbase/mapreduce/ImportTsv.java,
> line 96
> >> > <http://review.hbase.org/r/87/diff/1/?file=613#file613line96>
> >> >
> >> > Is this safe? Is there escaping you should be
> handling here?
> >
> > Plan to address this in docs for importtsv - it's not
> a good TSV parser
> that supports quoting, etc.
> >
> >
> >> On 2010-05-28 17:19:22, stack wrote:
> >> >
> src/main/java/org/apache/hadoop/hbase/mapreduce/LoadIncrementalHFiles.java,
> line 229
> >> > <http://review.hbase.org/r/87/diff/1/?file=614#file614line229>
> >> >
> >> > I suppose in rare circumstances, it could
> split in here and mess you
> up. You need a no-split flag per region out in zk or
> something? We'd need
> such a thing for snapshotting methinks. But, thats for a
> later...
> >
> > if there's been a split, the new daughter region will
> detect that the
> HFile doesn't "fit" and throw WrongRegionException. This
> then triggers the
> retry on ServerCallable, which fetches the new region info
> from meta, and
> realizes it has to split the hfile before trying again.
> >
> >
> >> On 2010-05-28 17:19:22, stack wrote:
> >> >
> src/main/java/org/apache/hadoop/hbase/mapreduce/PutSortReducer.java,
> line 29
> >> > <http://review.hbase.org/r/87/diff/1/?file=615#file615line29>
> >> >
> >> > Class comment. Whats this thing do?
> >
> > fixed - copied from KeyValueSortReducer
> >
> >
> >> On 2010-05-28 17:19:22, stack wrote:
> >> >
> src/main/java/org/apache/hadoop/hbase/mapreduce/hadoopbackport/InputSampler.java,
> line 57
> >> > <http://review.hbase.org/r/87/diff/1/?file=617#file617line57>
> >> >
> >> > We need this even though you wrote out
> partition edges earlier when you
> read region boundaries?
> >
> > You're right that it's unused, but I figured I'd put
> it in for convenience
> - when doing a bulk load into a new table, for example, you
> may want to use
> this in order to figure out where to set region splits.
> >
> >
> >> On 2010-05-28 17:19:22, stack wrote:
> >> >
> src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java,
> line
> 1897
> >> > <http://review.hbase.org/r/87/diff/1/?file=619#file619line1897>
> >> >
> >> > This is old cruddy stuff you are asking
> about. It looks like I added
> splitsAndClosesLock to stop either happening during
> critical periods
> (HBASE-588) and the splits lock looks now like a half
> measure from way back
> in 2007 -- HBASE-217. Lets make an issue to clean it up.
> >
> > Which one should I be taking? Do I need both? Which
> order?
> >
> >
> >> On 2010-05-28 17:19:22, stack wrote:
> >> >
> src/main/java/org/apache/hadoop/hbase/regionserver/Store.java,
> line 511
> >> > <http://review.hbase.org/r/87/diff/1/?file=621#file621line511>
> >> >
> >> > There is FSUtils in hbase util
> >
> > On second thought actually this should be somewhat
> store-specific, so took
> out the TODO
> >
> >
> >> On 2010-05-28 17:19:22, stack wrote:
> >> >
> src/main/java/org/apache/hadoop/hbase/util/Bytes.java, line
> 1105
> >> > <http://review.hbase.org/r/87/diff/1/?file=622#file622line1105>
> >> >
> >> > You were going to remove this
> >
> > fixed
> >
> >
> >> On 2010-05-28 17:19:22, stack wrote:
> >> >
> src/main/java/org/apache/hadoop/hbase/regionserver/Store.java,
> line 535
> >> > <http://review.hbase.org/r/87/diff/1/?file=621#file621line535>
> >> >
> >> > What you going to do here? Get the last key
> from the map of storefiles
> in current region and make your file be one less than it?
> Your timestamp is
> current though so edits in here could overshadow those in
> another storefile.
> Thats ok?
> >
> > going to make storefiles be a list instead of map...
> new review coming in
> a bit
> >
> >
> >> On 2010-05-28 17:19:22, stack wrote:
> >> >
> src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java,
> line
> 646
> >> > <http://review.hbase.org/r/87/diff/1/?file=623#file623line646>
> >> >
> >> > Whats this about?
> >
> > added comment - idea is to restore back to local-mode
> job runner when
> cluster shuts down
> >
> >
> >> On 2010-05-28 17:19:22, stack wrote:
> >> >
> src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java,
> line
> 784
> >> > <http://review.hbase.org/r/87/diff/1/?file=623#file623line784>
> >> >
> >> > Its not removing it itself? Its registered to
> cleanup on exit.
> >
> > was seeing cases where in between different tests in
> the same test case,
> it wouldn't get cleaned up, etc.
> >
> >
> >> On 2010-05-28 17:19:22, stack wrote:
> >> >
> src/test/java/org/apache/hadoop/hbase/mapreduce/NMapInputFormat.java,
> line 40
> >> > <http://review.hbase.org/r/87/diff/1/?file=624#file624line40>
> >> >
> >> > How? If nullwritables, don't I just get
> nulls?
> >
> > Changed to:
> >
> > /**
> > * Input format that creates as many map tasks as
> configured in
> > * mapred.map.tasks, each provided with a single row
> of
> > * NullWritables. This can be useful when trying to
> write mappers
> > * which don't have any real input (eg when the mapper
> is simply
> > * producing random data as output)
> > */
> >
> >
> > - Todd
> >
> >
> >
> -----------------------------------------------------------
> > This is an automatically generated e-mail. To reply,
> visit:
> > http://review.hbase.org/r/87/#review96
> >
> -----------------------------------------------------------
> >
> >
> > On 2010-05-25 14:21:16, Todd Lipcon wrote:
> >>
> >>
> -----------------------------------------------------------
> >> This is an automatically generated e-mail. To
> reply, visit:
> >> http://review.hbase.org/r/87/
> >>
> -----------------------------------------------------------
> >>
> >> (Updated 2010-05-25 14:21:16)
> >>
> >>
> >> Review request for hbase, stack and Jonathan
> Gray.
> >>
> >>
> >> Summary
> >> -------
> >>
> >> Here's a first patch that implements bulk import
> into existing tables.
> This applies on top of HBASE-2586 and HBASE-2588 - I've
> pushed the series of
> the three to my github: http://github.com/toddlipcon/hbase/tree/hfof-review
> >>
> >> I have some TODOs left that I want to take care of
> before this gets
> committed, but since it's a pretty large patch, I figured
> I'd get it out for
> review ASAP.
> >>
> >> The stuff in the hadoopbackport package is
> essentially copypaste from
> Hadoop trunk, so you can ignore that in the review.
> >>
> >>
> >> This addresses bug HBASE-1923.
> >> http://issues.apache.org/jira/browse/HBASE-1923
> >>
> >>
> >> Diffs
> >> -----
> >>
> >> pom.xml 0a009cf
> >>
> src/main/java/org/apache/hadoop/hbase/HRegionInfo.java
> 29b0cd6
> >>
> src/main/java/org/apache/hadoop/hbase/io/ImmutableBytesWritable.java
> 0a9ec4b
> >>
> src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java
> cf4768f
> >>
> src/main/java/org/apache/hadoop/hbase/ipc/HRegionInterface.java
> 4cbe52a
> >>
> src/main/java/org/apache/hadoop/hbase/mapreduce/Driver.java
> 3d40695
> >>
> src/main/java/org/apache/hadoop/hbase/mapreduce/HFileOutputFormat.java
> 9c8e53e
> >>
> src/main/java/org/apache/hadoop/hbase/mapreduce/ImportTsv.java
> PRE-CREATION
> >>
> src/main/java/org/apache/hadoop/hbase/mapreduce/LoadIncrementalHFiles.java
> PRE-CREATION
> >>
> src/main/java/org/apache/hadoop/hbase/mapreduce/PutSortReducer.java
> PRE-CREATION
> >>
> src/main/java/org/apache/hadoop/hbase/mapreduce/SimpleTotalOrderPartitioner.java
> af3d588
> >>
> src/main/java/org/apache/hadoop/hbase/mapreduce/hadoopbackport/InputSampler.java
> PRE-CREATION
> >>
> src/main/java/org/apache/hadoop/hbase/mapreduce/hadoopbackport/TotalOrderPartitioner.java
> PRE-CREATION
> >>
> src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
> 287cd48
> >>
> src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
> fb65ed1
> >>
> src/main/java/org/apache/hadoop/hbase/regionserver/Store.java
> 7de766d
> >>
> src/main/java/org/apache/hadoop/hbase/util/Bytes.java
> a53dafe
> >>
> src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java
> ed8709f
> >>
> src/test/java/org/apache/hadoop/hbase/mapreduce/NMapInputFormat.java
> PRE-CREATION
> >>
> src/test/java/org/apache/hadoop/hbase/mapreduce/TestHFileOutputFormat.java
> d04ced2
> >>
> src/test/java/org/apache/hadoop/hbase/mapreduce/TestLoadIncrementalHFiles.java
> PRE-CREATION
> >>
> src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegionInfo.java
> fcb22fb
> >>
> >> Diff: http://review.hbase.org/r/87/diff
> >>
> >>
> >> Testing
> >> -------
> >>
> >> Primary unit/functional testing, a bit of
> pseudo-distributed testing.
> Plan on doing full system tests before commit as well.
> >>
> >>
> >> Thanks,
> >>
> >> Todd
> >>
> >>
> >
> 


      


Mime
View raw message