Return-Path: X-Original-To: apmail-hbase-user-archive@www.apache.org Delivered-To: apmail-hbase-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 65EFE10CA7 for ; Thu, 5 Jun 2014 22:08:12 +0000 (UTC) Received: (qmail 50816 invoked by uid 500); 5 Jun 2014 22:08:10 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 50753 invoked by uid 500); 5 Jun 2014 22:08:10 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 50741 invoked by uid 99); 5 Jun 2014 22:08:10 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 05 Jun 2014 22:08:10 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=5.0 tests=RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of michael_segel@hotmail.com designates 65.55.111.109 as permitted sender) Received: from [65.55.111.109] (HELO BLU004-OMC2S34.hotmail.com) (65.55.111.109) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 05 Jun 2014 22:08:05 +0000 Received: from BLU436-SMTP225 ([65.55.111.71]) by BLU004-OMC2S34.hotmail.com with Microsoft SMTPSVC(7.5.7601.22701); Thu, 5 Jun 2014 15:07:44 -0700 X-TMN: [9gzjzIF6dSG4vX2efOY+X4UADDXwjV/F] X-Originating-Email: [michael_segel@hotmail.com] Message-ID: Received: from 173-15-87-33-illinois.hfc.comcastbusiness.net ([173.15.87.33]) by BLU436-SMTP225.smtp.hotmail.com over TLS secured channel with Microsoft SMTPSVC(8.0.9200.16384); Thu, 5 Jun 2014 15:07:43 -0700 Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 (Mac OS X Mail 7.3 \(1878.2\)) Subject: Re: HBase export limit bandwith From: Michael Segel In-Reply-To: Date: Thu, 5 Jun 2014 17:07:41 -0500 Content-Transfer-Encoding: quoted-printable References: <538F139E.6050305@viadeoteam.com> To: user@hbase.apache.org X-Mailer: Apple Mail (2.1878.2) X-OriginalArrivalTime: 05 Jun 2014 22:07:43.0577 (UTC) FILETIME=[932B2C90:01CF810A] X-Virus-Checked: Checked by ClamAV on apache.org I guess you could say snap shot as in a point in time M/R job that = exports all of the rows in the table before a specified time X which = could default to the start time of the job.=20 Since you're running your export to the same cluster (but a different = directory from /hbase, you don't really have to worry about the number = of mappers.=20 However, since its a backup... you may want to reduce the number of = region files so you could reduce the data set to 10, 100, etc... = depending on the size of the underlying table and then as you write out = from the reducer you could write to S3 directly, but if you want more = control... you reduce to the local HDFS, then in a separate job or = single threaded program, you could open up a file at a time and trickle = it in. (Or write a map only job that has a set number of mappers defined = to run in parallel.=20 The only caveat is that you need to make sure you have enough disk space = to store the local copy until you complete the S3 write.=20 Of course there are other permutations.... like if you have a NAS/SAN = you could move the export there.=20 (Hot =3D=3D Hbase table. Warm =3D=3D HDFS outside of HBase, Luke Warm =3D=3D= local attached disks, Cold =3D S3...)=20 Again, it depends on the resources available to you and your enterprise. = YMMV.=20 On Jun 5, 2014, at 9:15 AM, Ted Yu wrote: > bq. take a snapshot and write the file(s) >=20 > Is the above referring to hbase snapshot ? > hbase 0.92.x doesn't support snapshot. >=20 > FYI >=20 >=20 > On Thu, Jun 5, 2014 at 5:11 AM, Michael Segel = > wrote: >=20 >> Ok... >>=20 >> So when the basic tools don't work... >> How about roll your own? >>=20 >> Step 1 take a snapshot and write the file(s) to a different location >> outside of /hbase. >> (Export to local disk on the cluster) >>=20 >> Step 2 write your own M/R job and control the number of mappers who = read >> from HDFS and write to S3. >> Assuming you want a block for block match. If you want to change the >> #files since each region would be a separate file, you could do the = write >> to S3 in the reduce phase. >> (Which is what you want.) >>=20 >>=20 >> On Jun 4, 2014, at 7:39 AM, Damien Hardy = wrote: >>=20 >>> Hello, >>>=20 >>> We are trying to export HBase table on S3 for backup purpose. >>> By default export tool run a map per region and we want to limit = output >>> bandwidth on internet (to amazon s3). >>>=20 >>> We were thinking in adding some reducer to limit the number of = writers >>> but this is explicitly hardcoded to 0 in Export class >>> ``` >>> // No reducers. Just write straight to output files. >>> job.setNumReduceTasks(0); >>> ``` >>>=20 >>> Is there an other way (propertie?) in hadoop to limit output = bandwidth ? >>>=20 >>> -- >>> Damien >>>=20 >>=20 >> The opinions expressed here are mine, while they may reflect a = cognitive >> thought, that is purely accidental. >> Use at your own risk. >> Michael Segel >> michael_segel (AT) hotmail.com >>=20 >>=20 >>=20 >>=20 >>=20 >>=20 The opinions expressed here are mine, while they may reflect a cognitive = thought, that is purely accidental.=20 Use at your own risk.=20 Michael Segel michael_segel (AT) hotmail.com