Return-Path: X-Original-To: apmail-hbase-user-archive@www.apache.org Delivered-To: apmail-hbase-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id AA9CFF5FB for ; Tue, 7 May 2013 23:03:50 +0000 (UTC) Received: (qmail 84559 invoked by uid 500); 7 May 2013 23:03:47 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 84495 invoked by uid 500); 7 May 2013 23:03:47 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 84486 invoked by uid 99); 7 May 2013 23:03:47 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 07 May 2013 23:03:47 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=RCVD_IN_DNSWL_LOW X-Spam-Check-By: apache.org Received-SPF: error (nike.apache.org: local policy) Received: from [209.85.212.44] (HELO mail-vb0-f44.google.com) (209.85.212.44) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 07 May 2013 23:03:41 +0000 Received: by mail-vb0-f44.google.com with SMTP id e13so1063319vbg.3 for ; Tue, 07 May 2013 16:03:00 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=x-received:mime-version:in-reply-to:references:from:date:message-id :subject:to:content-type:x-gm-message-state; bh=THIbaEAnK4sNBPmJVnlMD6dXlbnJZDofVl2VlPrLtvg=; b=c3dtIBoJwo3RFqH44TZbdAjxgqNW9QuuO4BLc2QgN0WNfYztDU9Puf3OVHymDImloF H/r7Ji1aRTsx9a8th9oE8cEA96TrlACGiekXJ+smQkl1geSFUZhfmQC3XGv0Dm5RRbei UtVh4aztwDBn1y9o3c1/P+0vbWo/+VD51ZlQZwRV5eWXwJvj7sDTB6jlYltdExA8rYLJ ZOZV+kTuZuHYfUEXhgfgCGWR8zalFXEynZsit+Yo97HHAvs/AkPfjnfRjAY8UcmlRX+o UFwdd/FLle6RD4XmSYbYw0e2Rjaa7zCRm2geKDrp3xv/MAb+tFLhL78UG2XgmXfvagkI s3Cw== X-Received: by 10.52.117.212 with SMTP id kg20mr2379026vdb.55.1367967780577; Tue, 07 May 2013 16:03:00 -0700 (PDT) MIME-Version: 1.0 Received: by 10.52.179.33 with HTTP; Tue, 7 May 2013 16:02:40 -0700 (PDT) In-Reply-To: References: From: Jean-Marc Spaggiari Date: Tue, 7 May 2013 16:02:40 -0700 Message-ID: Subject: Re: Export / Import and table splits To: user@hbase.apache.org Content-Type: text/plain; charset=UTF-8 X-Gm-Message-State: ALoCoQmF41kJlgcJhsMx3FRVLIJo5T8rcm0SSsh+sR4MM/QEXcdsEnKz0e26iIJataNCJdlQxxYp X-Virus-Checked: Checked by ClamAV on apache.org @Mohammad: The end goal is really more regarding the splits more than the model. So I don't think Lars' options are good for this usecase. @Mike: I agree that things were not configured correctly. User should have had split the table before doing the import. I like the idea of looking at the files to get the regions boundaries. That way you don't need to have the source_table still there... So we have 2 different things here. 1) a command on the shell to duplicate a table structure 2) an option on the import command to split the table regions based on the files names. If we agree on that I will open one JIRA for each... JM 2013/5/7 Michael Segel : > Silly question... > > If you're doing a simple export, then you end up with all of your prior regions as separate files in a directory, right? > > So in theory, you could find the first row and the last complete row of each file and then do your pre-splits based on the start key and end key that you find. > > That would be your tool so to speak. > > But to the point that reading back in these files will cause you to crash your RS and HBase? > That doesn't sound like its well tuned or right. > > HTH > -Mike > > On May 7, 2013, at 5:29 PM, Ted Yu wrote: > >> I am not aware of a tool which can pre-split table using another table's >> region boundaries as template. >> >> Such a tool would be nice to have. >> >> Cheers >> >> On Tue, May 7, 2013 at 3:23 PM, Jean-Marc Spaggiari >> wrote: >> >>> Hi, >>> >>> When we are doing an export, we are only exporting the data. Then when >>> we are importing that back, we need to make sure the table is >>> pre-splitted correctly else we might hotspot some servers. >>> >>> If you simply export then import without pre-splitting at all, you >>> will most probably brought some servers down because they will be >>> overwhelmed with splits and compactions. >>> >>> Do we have any tool to pre-split a table the same way another table is >>> already pre-splitted? >>> >>> Something like >>>> duplicate 'source_table', 'target_table' >>> >>> Which will create a new table called 'target_table' with exactly the >>> same parameters as 'source_table' and the same regions boundaries? >>> >>> If we don't have, will it be useful to have one? >>> >>> Or event something like: >>>> create 'target_table', 'f1', {SPLITS_MODEL => 'source_table'} >>> >>> >>> JM >>> >