Return-Path: Delivered-To: apmail-hadoop-hbase-dev-archive@minotaur.apache.org Received: (qmail 93833 invoked from network); 13 Oct 2009 08:25:54 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 13 Oct 2009 08:25:54 -0000 Received: (qmail 92388 invoked by uid 500); 13 Oct 2009 08:25:54 -0000 Delivered-To: apmail-hadoop-hbase-dev-archive@hadoop.apache.org Received: (qmail 92368 invoked by uid 500); 13 Oct 2009 08:25:53 -0000 Mailing-List: contact hbase-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hbase-dev@hadoop.apache.org Delivered-To: mailing list hbase-dev@hadoop.apache.org Received: (qmail 92358 invoked by uid 99); 13 Oct 2009 08:25:53 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 13 Oct 2009 08:25:53 +0000 X-ASF-Spam-Status: No, hits=-10.4 required=5.0 tests=AWL,BAYES_00,RCVD_IN_DNSWL_HI,SUBJECT_FUZZY_TION X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 13 Oct 2009 08:25:51 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 4FFE2234C045 for ; Tue, 13 Oct 2009 01:25:31 -0700 (PDT) Message-ID: <225859963.1255422331292.JavaMail.jira@brutus> Date: Tue, 13 Oct 2009 01:25:31 -0700 (PDT) From: "anty.rao (JIRA)" To: hbase-dev@hadoop.apache.org Subject: [jira] Commented: (HBASE-1901) "General" partitioner for "hbase-48" bulk (behind the api, write hfiles direct) uploader In-Reply-To: <62767284.1255204231253.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HBASE-1901?page=3Dcom.atlassian= .jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=3D1276= 5000#action_12765000 ]=20 anty.rao commented on HBASE-1901: --------------------------------- Hi: stack , I have done some test and find we should change the codes of TestHFileOutpu= tFormat a little ,or the test won't work. " int rows =3D this.conf.getInt("mapred.map.tasks", 1) * ROWSPERSPLIT;" should be=20 " int rows =3D this.conf.getInt("mapred.map.tasks", 1) * ROWSPERSPLIT+2;" just as you said , The end key needs to be exclusive; i.e. one larger th= an the biggest key in your key space. however ,the key range of TestHFileOutFormat is 1----conf.getInt("mapred.ma= p.tasks",1)*ROWSPERSLPLIT+1=EF=BC=8Cso we should add 1 more to rows(the en= d key). except that ,everything looks right.the STARTKEY and ENDKEY of each region = are correct. the precondition is we should know the startKey and endKey,now you have wri= tten the partitioner,can we write a MR job to calculate the startKey and en= dKey ? > "General" partitioner for "hbase-48" bulk (behind the api, write hfiles d= irect) uploader > -------------------------------------------------------------------------= --------------- > > Key: HBASE-1901 > URL: https://issues.apache.org/jira/browse/HBASE-1901 > Project: Hadoop HBase > Issue Type: Wish > Reporter: stack > Attachments: 1901.patch > > > For users to bulk upload by writing hfiles directly to the filesystem, th= ey currently need to write a partitioner that is intimate with how their ke= y schema works. This issue is about providing a general partitioner, one t= hat could never be as fair as a custom-written partitioner but that might j= ust work for many cases. The idea is that a user would supply the first an= d last keys in their dataset to upload. We'd then do bigdecimal on the ran= ge between start and end rowids dividing it by the number of reducers to co= me up with key ranges per reducer. > (I thought jgray had done some BigDecimal work dividing keys already but = I can't find it) --=20 This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.