Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id EA3F7200AC8 for ; Mon, 23 May 2016 21:45:14 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id E8D52160A05; Mon, 23 May 2016 19:45:14 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 47F43160A24 for ; Mon, 23 May 2016 21:45:14 +0200 (CEST) Received: (qmail 50424 invoked by uid 500); 23 May 2016 19:45:13 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 50223 invoked by uid 99); 23 May 2016 19:45:13 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 23 May 2016 19:45:13 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id 1A1B62C1F5C for ; Mon, 23 May 2016 19:45:13 +0000 (UTC) Date: Mon, 23 May 2016 19:45:13 +0000 (UTC) From: "ChiaPing Tsai (JIRA)" To: issues@hbase.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HBASE-15806) An endpoint-based export tool MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Mon, 23 May 2016 19:45:15 -0000 [ https://issues.apache.org/jira/browse/HBASE-15806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15296957#comment-15296957 ] ChiaPing Tsai commented on HBASE-15806: --------------------------------------- I don't profile all the phase but I measure the total job elapsed time (as [attached|https://issues.apache.org/jira/secure/attachment/12805500/Experiment.png]). The endpoint approach and MR approach have many similarities as following: 1) same arguments (scan and output options) 2) same split(one region one split) 3) same output (sequence file) The factor that impacts the performance is the endpoint approach doesn't transfer data from regionserver to client (save rpc), so the endpoint apprache will be faster than MR approach (if the data output is not bottleneck) Sincerely > An endpoint-based export tool > ----------------------------- > > Key: HBASE-15806 > URL: https://issues.apache.org/jira/browse/HBASE-15806 > Project: HBase > Issue Type: New Feature > Affects Versions: 2.0.0 > Reporter: ChiaPing Tsai > Assignee: ChiaPing Tsai > Priority: Minor > Fix For: 2.0.0 > > Attachments: Experiment.png, HBASE-15806.patch > > > The time for exporting table can be reduced, if we use the endpoint technique to export the hdfs files by the region server rather than by hbase client. > In my experiments, the elapsed time of endpoint-based export can be less than half of current export tool (enable the hdfs compression) > But the shortcomings is we need to alter table for deploying the endpoint > any comments about this? thanks -- This message was sent by Atlassian JIRA (v6.3.4#6332)