Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Message-ID: <569C75A8.3080405@orkash.com>
Date: Mon, 18 Jan 2016 10:48:32 +0530
From: "mohit.kaushik" <mohit.kaushik@orkash.com>
User-Agent: Mozilla/5.0 (X11; Linux x86_64;
 rv:31.0) Gecko/20100101 Thunderbird/31.4.0
MIME-Version: 1.0
To: user@hadoop.apache.org
Subject: Re: Data Storage for Joins and ACID transactions + Hadoop Cluster
References: 
 <CAHfzKEqk_aL1O5k7iYETtYHRTiP4zge1RPVrf+An31gYdnADRQ@mail.gmail.com>
In-Reply-To: 
 <CAHfzKEqk_aL1O5k7iYETtYHRTiP4zge1RPVrf+An31gYdnADRQ@mail.gmail.com>
Content-Type: multipart/alternative;
 boundary="------------000305090106010200040608"

--------------000305090106010200040608
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit

Hive provides a SQL like functionality over hadoop but NOSQL does not 
provide all SQL capabilities very well. As the number of joins increase 
performance decreases. Instead you should try to model your data in one 
table to avoid joins. You can try Apache Accumulo which provides full 
control, over data structure and you also don't have have to define 
Column families in advance like in HBase you have to. Its fast and most 
scalable tested datastore which uses Hadoop in its base.

-Mohit Kaushik

On 01/18/2016 10:32 AM, Divya Gehlot wrote:
> Hi,
> Which Data storage is best for multiple joins on the run time in Hadoop.
> Tried Hive but performance is poor.
> Pointers/Guidance appreciated.
>
>
> Thanks,
> Regards,
> Divya

--------------000305090106010200040608
Content-Type: text/html; charset=utf-8
Content-Transfer-Encoding: 7bit

<html>
  <head>
    <meta content="text/html; charset=utf-8" http-equiv="Content-Type">
  </head>
  <body bgcolor="#FFFFFF" text="#000000">
    <div class="moz-cite-prefix">Hive provides a SQL like functionality
      over hadoop but NOSQL does not provide all SQL capabilities very
      well. As the number of joins increase performance decreases.
      Instead you should try to model your data in one table to avoid
      joins. You can try Apache Accumulo which provides full control,
      over data structure and you also don't have have to define Column
      families in advance like in HBase you have to. Its fast and most
      scalable tested datastore which uses Hadoop in its base. <br>
      <br>
      -Mohit Kaushik<br>
      <br>
      On 01/18/2016 10:32 AM, Divya Gehlot wrote:<br>
    </div>
    <blockquote
cite="mid:CAHfzKEqk_aL1O5k7iYETtYHRTiP4zge1RPVrf+An31gYdnADRQ@mail.gmail.com"
      type="cite">
      <meta http-equiv="Context-Type" content="text/html; charset=UTF-8">
      <div dir="ltr">Hi,
        <div>Which Data storage is best for multiple joins on the run
          time in Hadoop.</div>
        <div>Tried Hive but performance is poor.</div>
        <div>Pointers/Guidance appreciated.</div>
        <div><br>
        </div>
        <div><br>
        </div>
        <div>Thanks,<br>
        </div>
        <div>Regards,</div>
        <div>Divya</div>
      </div>
    </blockquote>
    <font face="Arial">
      <script>function fbImageClick() {
window.open(<a class="moz-txt-link-rfc2396E" href="https://www.facebook.com/Orkash2012">"https://www.facebook.com/Orkash2012"</a>);
}
function twitterImageClick() {
window.open(<a class="moz-txt-link-rfc2396E" href="https://twitter.com/Orkash">"https://twitter.com/Orkash"</a>);
}
function blogImageClick() {
window.open(<a class="moz-txt-link-rfc2396E" href="http://www.orkash.com/blog/">"http://www.orkash.com/blog/"</a>);
}
//onclick="fbImageClick();" 
</script></font>
  </body>
</html>

--------------000305090106010200040608--