Return-Path: X-Original-To: apmail-hbase-user-archive@www.apache.org Delivered-To: apmail-hbase-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 602BCF63E for ; Tue, 7 May 2013 23:11:34 +0000 (UTC) Received: (qmail 4194 invoked by uid 500); 7 May 2013 23:11:32 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 4138 invoked by uid 500); 7 May 2013 23:11:32 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 4129 invoked by uid 99); 7 May 2013 23:11:32 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 07 May 2013 23:11:32 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=5.0 tests=RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of michael_segel@hotmail.com designates 65.55.111.79 as permitted sender) Received: from [65.55.111.79] (HELO blu0-omc2-s4.blu0.hotmail.com) (65.55.111.79) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 07 May 2013 23:11:24 +0000 Received: from BLU0-SMTP50 ([65.55.111.72]) by blu0-omc2-s4.blu0.hotmail.com with Microsoft SMTPSVC(6.0.3790.4675); Tue, 7 May 2013 16:11:04 -0700 X-EIP: [9ZLK50JbLLtI/Ni4kl6qGW3tl0RQeaYU] X-Originating-Email: [michael_segel@hotmail.com] Message-ID: Received: from [172.25.48.182] ([173.252.71.6]) by BLU0-SMTP50.phx.gbl over TLS secured channel with Microsoft SMTPSVC(6.0.3790.4675); Tue, 7 May 2013 16:11:02 -0700 Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 (Mac OS X Mail 6.3 \(1503\)) Subject: Re: Export / Import and table splits From: Michael Segel In-Reply-To: Date: Tue, 7 May 2013 18:11:00 -0500 Content-Transfer-Encoding: quoted-printable References: To: user@hbase.apache.org X-Mailer: Apple Mail (2.1503) X-OriginalArrivalTime: 07 May 2013 23:11:02.0911 (UTC) FILETIME=[24FE08F0:01CE4B78] X-Virus-Checked: Checked by ClamAV on apache.org I don't see much value in duplicating the table's structure, but IMHO, = the jury is still out.=20 On May 7, 2013, at 6:02 PM, Jean-Marc Spaggiari = wrote: > @Mohammad: The end goal is really more regarding the splits more than > the model. So I don't think Lars' options are good for this usecase. > @Mike: I agree that things were not configured correctly. User should > have had split the table before doing the import. I like the idea of > looking at the files to get the regions boundaries. That way you don't > need to have the source_table still there... >=20 > So we have 2 different things here. > 1) a command on the shell to duplicate a table structure > 2) an option on the import command to split the table regions based on > the files names. >=20 > If we agree on that I will open one JIRA for each... >=20 > JM >=20 > 2013/5/7 Michael Segel : >> Silly question... >>=20 >> If you're doing a simple export, then you end up with all of your = prior regions as separate files in a directory, right? >>=20 >> So in theory, you could find the first row and the last complete row = of each file and then do your pre-splits based on the start key and end = key that you find. >>=20 >> That would be your tool so to speak. >>=20 >> But to the point that reading back in these files will cause you to = crash your RS and HBase? >> That doesn't sound like its well tuned or right. >>=20 >> HTH >> -Mike >>=20 >> On May 7, 2013, at 5:29 PM, Ted Yu wrote: >>=20 >>> I am not aware of a tool which can pre-split table using another = table's >>> region boundaries as template. >>>=20 >>> Such a tool would be nice to have. >>>=20 >>> Cheers >>>=20 >>> On Tue, May 7, 2013 at 3:23 PM, Jean-Marc Spaggiari = >>> wrote: >>>=20 >>>> Hi, >>>>=20 >>>> When we are doing an export, we are only exporting the data. Then = when >>>> we are importing that back, we need to make sure the table is >>>> pre-splitted correctly else we might hotspot some servers. >>>>=20 >>>> If you simply export then import without pre-splitting at all, you >>>> will most probably brought some servers down because they will be >>>> overwhelmed with splits and compactions. >>>>=20 >>>> Do we have any tool to pre-split a table the same way another table = is >>>> already pre-splitted? >>>>=20 >>>> Something like >>>>> duplicate 'source_table', 'target_table' >>>>=20 >>>> Which will create a new table called 'target_table' with exactly = the >>>> same parameters as 'source_table' and the same regions boundaries? >>>>=20 >>>> If we don't have, will it be useful to have one? >>>>=20 >>>> Or event something like: >>>>> create 'target_table', 'f1', {SPLITS_MODEL =3D> 'source_table'} >>>>=20 >>>>=20 >>>> JM >>>>=20 >>=20 >=20