Mailing-List: contact hbase-dev-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: hbase-dev@hadoop.apache.org
Message-ID: <225859963.1255422331292.JavaMail.jira@brutus>
Date: Tue, 13 Oct 2009 01:25:31 -0700 (PDT)
From: "anty.rao (JIRA)" <jira@apache.org>
To: hbase-dev@hadoop.apache.org
Subject: [jira] Commented: (HBASE-1901) "General" partitioner for "hbase-48"
 bulk (behind the api, write hfiles direct) uploader
In-Reply-To: <62767284.1255204231253.JavaMail.jira@brutus>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable


    [ https://issues.apache.org/jira/browse/HBASE-1901?page=3Dcom.atlassian=
.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=3D1276=
5000#action_12765000 ]=20

anty.rao commented on HBASE-1901:
---------------------------------

Hi: stack ,
I have done some test and find we should change the codes of TestHFileOutpu=
tFormat a little ,or the test won't work.
" int rows =3D this.conf.getInt("mapred.map.tasks", 1) * ROWSPERSPLIT;"
should be=20
" int rows =3D this.conf.getInt("mapred.map.tasks", 1) * ROWSPERSPLIT+2;"
 just as you said ,   The end key needs to be exclusive; i.e. one larger th=
an the biggest key in your key space.
however ,the key range of TestHFileOutFormat is 1----conf.getInt("mapred.ma=
p.tasks",1)*ROWSPERSLPLIT+1=EF=BC=8Cso  we should add 1 more to rows(the en=
d key).
except that ,everything looks right.the STARTKEY and ENDKEY of each region =
are correct.
the precondition is we should know the startKey and endKey,now you have wri=
tten the partitioner,can we write a MR job to calculate the startKey and en=
dKey ?

> "General" partitioner for "hbase-48" bulk (behind the api, write hfiles d=
irect) uploader
> -------------------------------------------------------------------------=
---------------
>
>                 Key: HBASE-1901
>                 URL: https://issues.apache.org/jira/browse/HBASE-1901
>             Project: Hadoop HBase
>          Issue Type: Wish
>            Reporter: stack
>         Attachments: 1901.patch
>
>
> For users to bulk upload by writing hfiles directly to the filesystem, th=
ey currently need to write a partitioner that is intimate with how their ke=
y schema works.  This issue is about providing a general partitioner, one t=
hat could never be as fair as a custom-written partitioner but that might j=
ust work for many cases.  The idea is that a user would supply the first an=
d last keys in their dataset to upload.  We'd then do bigdecimal on the ran=
ge between start and end rowids dividing it by the number of reducers to co=
me up with key ranges per reducer.
> (I thought jgray had done some BigDecimal work dividing keys already but =
I can't find it)

--=20
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.