Return-Path: X-Original-To: apmail-hbase-user-archive@www.apache.org Delivered-To: apmail-hbase-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id A00A8C191 for ; Sat, 18 Aug 2012 11:15:15 +0000 (UTC) Received: (qmail 23820 invoked by uid 500); 18 Aug 2012 11:15:13 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 23610 invoked by uid 500); 18 Aug 2012 11:15:13 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 23570 invoked by uid 99); 18 Aug 2012 11:15:11 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 18 Aug 2012 11:15:11 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=5.0 tests=RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of michael_segel@hotmail.com designates 65.55.111.105 as permitted sender) Received: from [65.55.111.105] (HELO blu0-omc2-s30.blu0.hotmail.com) (65.55.111.105) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 18 Aug 2012 11:15:01 +0000 Received: from BLU0-SMTP426 ([65.55.111.73]) by blu0-omc2-s30.blu0.hotmail.com with Microsoft SMTPSVC(6.0.3790.4675); Sat, 18 Aug 2012 04:14:41 -0700 X-Originating-IP: [173.15.87.37] X-EIP: [xBWZtZOlHuiLlc3kc4FlobyUYZipjsRX] X-Originating-Email: [michael_segel@hotmail.com] Message-ID: Received: from [192.168.0.102] ([173.15.87.37]) by BLU0-SMTP426.blu0.hotmail.com over TLS secured channel with Microsoft SMTPSVC(6.0.3790.4675); Sat, 18 Aug 2012 04:14:39 -0700 Content-Type: text/plain; charset="iso-8859-1" MIME-Version: 1.0 (Mac OS X Mail 6.0 \(1485\)) Subject: Re: issues copying data from one table to another From: Michael Segel In-Reply-To: Date: Sat, 18 Aug 2012 06:14:38 -0500 Content-Transfer-Encoding: quoted-printable References: To: user@hbase.apache.org X-Mailer: Apple Mail (2.1485) X-OriginalArrivalTime: 18 Aug 2012 11:14:39.0213 (UTC) FILETIME=[A87BB9D0:01CD7D32] Can you disable the table?=20 How much free disk space do you have?=20 Is this a production cluster? Can you upgrade to CDH3u5? Are you running a capacity scheduler or fair scheduler? Just out of curiosity, what would happen if you could disable the table, = alter the table's max file size and then attempted to merge regions? = Note: I've never tried this, don't know if its possible, just thinking = outside of the box... Outside of that... the safest way to do this would be to export the = table. You'll get 2800 mappers so if you are using a scheduler, you just = put this in to a queue that limits the number of concurrent mappers.=20 When you import the data, in to your new table, you can run on an even = more restrictive queue so that you have less of an impact on your = system. The downside is that its going to take a bit longer to run. = Again, its probably the safest way to do this.... HTH,=20 -Mike On Aug 17, 2012, at 2:17 PM, Norbert Burger = wrote: > Hi folks -- we're running CDH3u3 (0.90.4). I'm trying export data > from an existing table that has far too many regions (2600+ for only 8 > regionservers) into one with a more reasonable region count for this > cluster (256). Overall data volume is approx. 3 TB. >=20 > I thought initially that I'd use the bulkload/importtsv approach, but > it turns out this table's schema has column qualifiers made from > timestamps, so it's impossible for me to specify a list of target > columns for importtsv. =46rom what I can tell, the TSV interchange > format requires your data to have the same colquals throughout. >=20 > I took a look at CopyTable and Export/Import, which both appear to > wrap the Hbase client API (emitting Puts from a mapper). But I'm > seeing significant performance problems with this approach, to the > point that I'm not sure it's feasible. Export appears to work OK, but > when I try importing the data back from HDFS, the rest of our cluster > drags to halt -- client writes (even those not associated with the > Import) start timing out. Fwiw, import already disables autoFlush > (via TableOutputFormat). >=20 > =46rom [1], one option I could try would to disable the WAL. Are = there > are other techniques I should try? Has anyone implemented a > bulkloader which doesn't use the TSV format? >=20 > Norbert >=20 > [1] http://hbase.apache.org/book/perf.writing.html >=20