Return-Path: X-Original-To: apmail-hbase-user-archive@www.apache.org Delivered-To: apmail-hbase-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id F1B6BE103 for ; Thu, 17 Jan 2013 17:10:20 +0000 (UTC) Received: (qmail 63411 invoked by uid 500); 17 Jan 2013 17:10:18 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 63351 invoked by uid 500); 17 Jan 2013 17:10:18 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 63343 invoked by uid 99); 17 Jan 2013 17:10:18 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 17 Jan 2013 17:10:18 +0000 X-ASF-Spam-Status: No, hits=2.5 required=5.0 tests=FREEMAIL_REPLY,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of ramkrishna.s.vasudevan@gmail.com designates 209.85.215.176 as permitted sender) Received: from [209.85.215.176] (HELO mail-ea0-f176.google.com) (209.85.215.176) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 17 Jan 2013 17:10:07 +0000 Received: by mail-ea0-f176.google.com with SMTP id a13so817189eaa.35 for ; Thu, 17 Jan 2013 09:09:46 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:x-received:in-reply-to:references:date:message-id :subject:from:to:content-type; bh=XXhmDQ6OJ2GXEaSwp/3tSpGvHQMJikt/NDgP/YB1Ni4=; b=sE6Adjc7sg93eGPZMoSzNDEGxEcHvPPRUYg9aHiq+5oBmfjWQ9e3H4q01eQZOE/v58 e7zlteAQ6kX1hLZ7k4lj6tszrCBy1/J2unVnIF+Wjc9olheXQXeezmwVaMlToGUyyi5h IsOQeEhtL8Jv5HreLRYqKANpW23vwOumVMVV9bdj8Ob+8GbSRsj1zgZHo9meFlPeG5eI qHB4Mnde1s/Hgktpe3gxZj5S7UXpRLV8A5S2iT3C8k8TE/iRR8o1O1JV4e3lY4XAS5lC WHRcJxJZ7PRgb8gyH5xqeqW9adcn6XW+gSupF0XY2A4nckTuWfIfMTljqBVnZg12mUpn RfmA== MIME-Version: 1.0 X-Received: by 10.14.175.70 with SMTP id y46mr15812080eel.6.1358442586044; Thu, 17 Jan 2013 09:09:46 -0800 (PST) Received: by 10.14.2.135 with HTTP; Thu, 17 Jan 2013 09:09:45 -0800 (PST) In-Reply-To: References: <609644380836277084@unknownmsgid> Date: Thu, 17 Jan 2013 22:39:45 +0530 Message-ID: Subject: Re: Loading data, hbase slower than Hive? From: ramkrishna vasudevan To: user@hbase.apache.org Content-Type: multipart/alternative; boundary=047d7b5dbe20c9fb3104d37f0f71 X-Virus-Checked: Checked by ClamAV on apache.org --047d7b5dbe20c9fb3104d37f0f71 Content-Type: text/plain; charset=ISO-8859-1 Hive is more for batch and HBase is for more of real time data. Regards Ram On Thu, Jan 17, 2013 at 10:30 PM, Anoop John wrote: > In case of Hive data insertion means placing the file under table path in > HDFS. HBase need to read the data and convert it into its format. (HFiles) > MR is doing this work.. So this makes it clear that HBase will be slower. > :) As Michael said the read operation... > > > > -Anoop- > > On Thu, Jan 17, 2013 at 10:14 PM, Austin Chungath >wrote: > > > Hi, > > Problem: hive took 6 mins to load a data set, hbase took 1 hr 14 mins. > > It's a 20 gb data set approx 230 million records. The data is in hdfs, > > single text file. The cluster is 11 nodes, 8 cores. > > > > I loaded this in hive, partitioned by date and bucketed into 32 and > sorted. > > Time taken is 6 mins. > > > > I loaded the same data into hbase, in the same cluster by writing a map > > reduce code. It took 1hr 14 mins. The cluster wasn't running anything > else > > and assuming that the code that i wrote is good enough, what is it that > > makes hbase slower than hive in loading the data? > > > > Thanks, > > Austin > > > --047d7b5dbe20c9fb3104d37f0f71--