Mailing-List: contact user-help@hive.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hive.apache.org
Received-SPF: pass (athena.apache.org: domain of kulkarni.swarnim@gmail.com
 designates 209.85.214.180 as permitted sender)
MIME-Version: 1.0
In-Reply-To: 
 <CACs5X7GcwkAO7Qe_=ygJYfYPKpZXXnPXUGD8g1kVYEPT1vuOTw@mail.gmail.com>
References: 
 <CACs5X7GcwkAO7Qe_=ygJYfYPKpZXXnPXUGD8g1kVYEPT1vuOTw@mail.gmail.com>
From: "kulkarni.swarnim@gmail.com" <kulkarni.swarnim@gmail.com>
Date: Wed, 17 Jul 2013 12:59:59 -0500
Message-ID: 
 <CAHnpetRWbBbHipT2BUun+UrFpsm4V8FpnNBT9+LZrYEeO7ATDQ@mail.gmail.com>
Subject: Re: which approach is better
To: user@hive.apache.org
Content-Type: multipart/alternative; boundary=089e013cb85cd9b5cb04e1b8dd36

--089e013cb85cd9b5cb04e1b8dd36
Content-Type: text/plain; charset=ISO-8859-1

First of all, that might not be the right approach to choose the underlying
storage. You should choose HDFS or HBase depending on whether the data is
going to be used for batch processing or you need random access on top of
it. HBase is just another layer on top of HDFS. So obviously the queries
running on top of HBase are going to be less efficient. So if you can get
away with using HDFS, I would say that is the best and simplest approach.


On Wed, Jul 17, 2013 at 12:40 PM, Hamza Asad <hamza.asad13@gmail.com> wrote:

> Please let me knw which approach is better. Either i save my data directly
> to HDFS and run hive (shark) queries over it OR store my data in HBASE, and
> then query it.. as i want to ensure efficient data retrieval and data
> remains safe and can easily recover if hadoop crashes.
>
> --
> *Muhammad Hamza Asad*
>


-- 
Swarnim

--089e013cb85cd9b5cb04e1b8dd36
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">First of all, that might not be the right approach to choo=
se the underlying storage. You should choose HDFS or HBase depending on whe=
ther the data is going to be used for batch processing or you need random a=
ccess on top of it. HBase is just another layer on top of HDFS. So obviousl=
y the queries running on top of HBase are going to be less efficient. So if=
 you can get away with using HDFS, I would say that is the best and simples=
t approach.</div>

<div class=3D"gmail_extra"><br><br><div class=3D"gmail_quote">On Wed, Jul 1=
7, 2013 at 12:40 PM, Hamza Asad <span dir=3D"ltr">&lt;<a href=3D"mailto:ham=
za.asad13@gmail.com" target=3D"_blank">hamza.asad13@gmail.com</a>&gt;</span=
> wrote:<br>

<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex"><div dir=3D"ltr">Please let me knw which app=
roach is better. Either i save my data=20
directly to HDFS and run hive (shark) queries over it OR store my data=20
in HBASE, and then query it.. as i want to ensure efficient data=20
retrieval and data remains safe and can easily recover if hadoop=20
crashes.<span class=3D"HOEnZb"><font color=3D"#888888"><br clear=3D"all"><b=
r>-- <br><div dir=3D"ltr"><div><b style=3D"color:rgb(102,102,102);font-fami=
ly:georgia,serif"><i>Muhammad Hamza Asad</i></b></div></div>
</font></span></div>
</blockquote></div><br><br clear=3D"all"><div><br></div>-- <br>Swarnim
</div>

--089e013cb85cd9b5cb04e1b8dd36--