Return-Path: X-Original-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 0E17818478 for ; Mon, 18 Jan 2016 05:18:19 +0000 (UTC) Received: (qmail 77951 invoked by uid 500); 18 Jan 2016 05:18:15 -0000 Delivered-To: apmail-hadoop-hdfs-user-archive@hadoop.apache.org Received: (qmail 77807 invoked by uid 500); 18 Jan 2016 05:18:15 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 77796 invoked by uid 99); 18 Jan 2016 05:18:14 -0000 Received: from Unknown (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 18 Jan 2016 05:18:14 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 85B24C27B7 for ; Mon, 18 Jan 2016 05:18:14 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.979 X-Spam-Level: ** X-Spam-Status: No, score=2.979 tagged_above=-999 required=6.31 tests=[HTML_MESSAGE=3, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001] autolearn=disabled Received: from mx1-us-west.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id vKkN_5n1KPaq for ; Mon, 18 Jan 2016 05:18:13 +0000 (UTC) Received: from smtp81.ord1c.emailsrvr.com (smtp81.ord1c.emailsrvr.com [108.166.43.81]) by mx1-us-west.apache.org (ASF Mail Server at mx1-us-west.apache.org) with ESMTPS id 690572054C for ; Mon, 18 Jan 2016 05:18:13 +0000 (UTC) Received: from smtp19.relay.ord1c.emailsrvr.com (localhost.localdomain [127.0.0.1]) by smtp19.relay.ord1c.emailsrvr.com (SMTP Server) with ESMTP id 650D418014D; Mon, 18 Jan 2016 00:18:06 -0500 (EST) X-Auth-ID: mohit.kaushik@orkash.com Received: by smtp19.relay.ord1c.emailsrvr.com (Authenticated sender: mohit.kaushik-AT-orkash.com) with ESMTPSA id B921C180166 for ; Mon, 18 Jan 2016 00:18:05 -0500 (EST) X-Sender-Id: mohit.kaushik@orkash.com Received: from [192.168.0.121] ([UNAVAILABLE]. [14.141.49.198]) (using TLSv1.2 with cipher DHE-RSA-AES128-SHA) by 0.0.0.0:465 (trex/5.5.4); Mon, 18 Jan 2016 00:18:06 -0500 Message-ID: <569C75A8.3080405@orkash.com> Date: Mon, 18 Jan 2016 10:48:32 +0530 From: "mohit.kaushik" User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.4.0 MIME-Version: 1.0 To: user@hadoop.apache.org Subject: Re: Data Storage for Joins and ACID transactions + Hadoop Cluster References: In-Reply-To: Content-Type: multipart/alternative; boundary="------------000305090106010200040608" --------------000305090106010200040608 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Hive provides a SQL like functionality over hadoop but NOSQL does not provide all SQL capabilities very well. As the number of joins increase performance decreases. Instead you should try to model your data in one table to avoid joins. You can try Apache Accumulo which provides full control, over data structure and you also don't have have to define Column families in advance like in HBase you have to. Its fast and most scalable tested datastore which uses Hadoop in its base. -Mohit Kaushik On 01/18/2016 10:32 AM, Divya Gehlot wrote: > Hi, > Which Data storage is best for multiple joins on the run time in Hadoop. > Tried Hive but performance is poor. > Pointers/Guidance appreciated. > > > Thanks, > Regards, > Divya --------------000305090106010200040608 Content-Type: text/html; charset=utf-8 Content-Transfer-Encoding: 7bit
Hive provides a SQL like functionality over hadoop but NOSQL does not provide all SQL capabilities very well. As the number of joins increase performance decreases. Instead you should try to model your data in one table to avoid joins. You can try Apache Accumulo which provides full control, over data structure and you also don't have have to define Column families in advance like in HBase you have to. Its fast and most scalable tested datastore which uses Hadoop in its base.

-Mohit Kaushik

On 01/18/2016 10:32 AM, Divya Gehlot wrote:
Hi,
Which Data storage is best for multiple joins on the run time in Hadoop.
Tried Hive but performance is poor.
Pointers/Guidance appreciated.


Thanks,
Regards,
Divya
--------------000305090106010200040608--