Return-Path: X-Original-To: apmail-hbase-user-archive@www.apache.org Delivered-To: apmail-hbase-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 936ED1196A for ; Mon, 14 Apr 2014 12:42:10 +0000 (UTC) Received: (qmail 24028 invoked by uid 500); 14 Apr 2014 12:42:06 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 23800 invoked by uid 500); 14 Apr 2014 12:42:06 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 23787 invoked by uid 99); 14 Apr 2014 12:42:05 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 14 Apr 2014 12:42:05 +0000 X-ASF-Spam-Status: No, hits=1.7 required=5.0 tests=FREEMAIL_ENVFROM_END_DIGIT,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of konstt2000@gmail.com designates 74.125.82.42 as permitted sender) Received: from [74.125.82.42] (HELO mail-wg0-f42.google.com) (74.125.82.42) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 14 Apr 2014 12:42:00 +0000 Received: by mail-wg0-f42.google.com with SMTP id y10so7947381wgg.25 for ; Mon, 14 Apr 2014 05:41:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=9nXnZU+2b0/ArDxK7MCW/Tgx1fz/hB7InUiulWViLrw=; b=l9VDG+nk6ngpkXwyAr9/uOl3Swfv9vK/hKLtJURDF37thLoEw2IeawsMvuAjixgPls gYVQE3SwE5wfpBhyz3c7VREPIVKxpapatyfQhTVLcOElHoCFUYaQYuIkPdkTKAm48NO2 i07YmXIfiXljui6TlToAXB0WADgjDST0ctowX2Xtirrkk4+w6bqMyOv89DpsplpUidbL MaZ53Ux3B+TJzqbTlGVJSYFx77qEbl9ksV3fVP7E/L7jmH7VUXWRQFjfwb54aZEmPlLL dTF1pdhUmMI4tdPzV/kNLkw2Vx5N807v15lPzognhEpxpJOIp03fI5tRNyXzPPt4d/xw 3kOA== MIME-Version: 1.0 X-Received: by 10.180.99.40 with SMTP id en8mr9632409wib.24.1397479298891; Mon, 14 Apr 2014 05:41:38 -0700 (PDT) Received: by 10.217.54.132 with HTTP; Mon, 14 Apr 2014 05:41:38 -0700 (PDT) In-Reply-To: <1F39959E-164A-45A8-B621-BCBCC3126A8B@gmail.com> References: <1F39959E-164A-45A8-B621-BCBCC3126A8B@gmail.com> Date: Mon, 14 Apr 2014 14:41:38 +0200 Message-ID: Subject: Re: How to generate a large dataset quickly. From: Guillermo Ortiz To: user@hbase.apache.org Content-Type: multipart/alternative; boundary=f46d041825803108c604f7000139 X-Virus-Checked: Checked by ClamAV on apache.org --f46d041825803108c604f7000139 Content-Type: text/plain; charset=UTF-8 I'm using. 0.94.6-cdh4.4.0, I use the bulkload: FileInputFormat.addInputPath(job, new Path(INPUT_FOLDER)); FileOutputFormat.setOutputPath(job, hbasePath); HTable table = new HTable(jConf, HBASE_TABLE); HFileOutputFormat.configureIncrementalLoad(job, table); It seems that it takes really long time when it starts to execute the Puts to HBase in the reduce phase. 2014-04-14 14:35 GMT+02:00 Ted Yu : > Which hbase release did you run mapreduce job ? > > Cheers > > On Apr 14, 2014, at 4:50 AM, Guillermo Ortiz wrote: > > > I want to create a large dateset for HBase with different versions and > > number of rows. It's about 10M rows and 100 versions to do some > benchmarks. > > > > What's the fastest way to create it?? I'm generating the dataset with a > > Mapreduce of 100.000rows and 10verions. It takes 17minutes and size > around > > 7Gb. I don't know if I could do it quickly. The bottleneck is when > > MapReduces write the output and when transfer the output to the Reduces. > --f46d041825803108c604f7000139--