Return-Path: X-Original-To: apmail-crunch-dev-archive@www.apache.org Delivered-To: apmail-crunch-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id A0A5410D51 for ; Mon, 22 Jul 2013 02:34:56 +0000 (UTC) Received: (qmail 33016 invoked by uid 500); 22 Jul 2013 02:34:56 -0000 Delivered-To: apmail-crunch-dev-archive@crunch.apache.org Received: (qmail 32945 invoked by uid 500); 22 Jul 2013 02:34:52 -0000 Mailing-List: contact dev-help@crunch.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@crunch.apache.org Delivered-To: mailing list dev@crunch.apache.org Received: (qmail 32931 invoked by uid 500); 22 Jul 2013 02:34:49 -0000 Delivered-To: apmail-incubator-crunch-dev@incubator.apache.org Received: (qmail 32925 invoked by uid 99); 22 Jul 2013 02:34:49 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 22 Jul 2013 02:34:49 +0000 Date: Mon, 22 Jul 2013 02:34:49 +0000 (UTC) From: "Chao Shi (JIRA)" To: crunch-dev@incubator.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (CRUNCH-212) Need target wrapper for HFileOuptutFormat MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/CRUNCH-212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13714869#comment-13714869 ] Chao Shi commented on CRUNCH-212: --------------------------------- Hi Reid, I haven't thought on that thoroughly yet. bq. - setting up the partitioning to match regions on an existing HBase table I think we have to set up a TotalOrderPartitioner. The partition boundaries are determined from a scan on ".META.". bq. - handling multiple column families I think we can take PCollection as input from user, then divide them into multiple PCollection by their families. Then sort per family and write them to HFile targets. This requires user to explicitly tell use what are the column families are used, as crunch cannot determine how many ways of output at runtime. This approach looks more "crunch-style". :) Any suggestions are welcome. > Need target wrapper for HFileOuptutFormat > ----------------------------------------- > > Key: CRUNCH-212 > URL: https://issues.apache.org/jira/browse/CRUNCH-212 > Project: Crunch > Issue Type: Improvement > Components: IO > Reporter: Chao Shi > Attachments: crunch-212-draft.patch > > > I need to import data to hbase from MR. I found HFileOutputFormat is ~5x more efficient than HTableOutputFormat. So maybe we need a target wrapper for it. > Future more, is it possible to call HBase to load it automatically after HFiles are generated? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira