Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hadoop.apache.org
Received-SPF: pass (athena.apache.org: domain of cristobalgc@gmail.com
 designates 209.85.219.54 as permitted sender)
MIME-Version: 1.0
In-Reply-To: 
 <CAEb_DvBcEg-tca1z_yt-HaO8KmJ9AVbEuRWRkmjhG0YNco5VNg@mail.gmail.com>
References: 
 <CAEb_DvBcEg-tca1z_yt-HaO8KmJ9AVbEuRWRkmjhG0YNco5VNg@mail.gmail.com>
Date: Wed, 18 Jun 2014 11:39:07 -0400
Message-ID: 
 <CAJa=u4n7Oqprv3zPA2_0Bu+k1GJLkDT95Xqn0ZntsXZw4Vj1+w@mail.gmail.com>
Subject: Re: Use Hadoop and other Apache products for SQL query manipulations
From: =?ISO-8859-1?Q?Crist=F3bal_Giadach?= <cristobalgc@gmail.com>
To: user@hadoop.apache.org
Content-Type: multipart/alternative; boundary=001a11c2cbaa9afe5004fc1e0f10

--001a11c2cbaa9afe5004fc1e0f10
Content-Type: text/plain; charset=ISO-8859-1

Try impala or Hawk(
http://www.gopivotal.com/sites/default/files/Hawq_WP_042313_FINAL.pdf), in
my opinion the best choice for SQL-on-Hadoop.


On Wed, Jun 18, 2014 at 11:26 AM, Fengjiao Jiang <grapejudy@gmail.com>
wrote:

> Hi,
>
> We have a large data set originally stored on MS SQL and for intensive
> data aggregation manipulation, we're currently using Vertica. The thing is
> the data is very large and sometimes, a "select" or "insert" query which is
> very complex may needs even 10 minutes to return the correct results. (the
> database size is maybe 2GB)
>
> So we're thinking whether we can use Hadoop together with some other
> Apache Products (built on hadoop) to make the query faster.
> For example, if we can use Hadoop & HBase & ZooKeeper and write MR
> functions for these "SELECT" "INSERT" or complex queries like that to
> improve the query speed?
>
> Also, I don't know if the combination I listed above is a good one, should
> I use Hadoop, HBase and ZooKeepr or should I use Hadoop, Pig and Hive?
>
> My question is mainly a "SQL-on-Hadoop" thing, would please tell me if
> it's possible and if so, would you give me some suggestions? I do
> appreciate it a lot !
>
>
> Thanks.
>
> Best
> Judy
>

--001a11c2cbaa9afe5004fc1e0f10
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">Try impala or Hawk(<a href=3D"http://www.gopivotal.com/sit=
es/default/files/Hawq_WP_042313_FINAL.pdf">http://www.gopivotal.com/sites/d=
efault/files/Hawq_WP_042313_FINAL.pdf</a>), in my opinion the best choice f=
or SQL-on-Hadoop.&nbsp;</div>
<div class=3D"gmail_extra"><br><br><div class=3D"gmail_quote">On Wed, Jun 1=
8, 2014 at 11:26 AM, Fengjiao Jiang <span dir=3D"ltr">&lt;<a href=3D"mailto=
:grapejudy@gmail.com" target=3D"_blank">grapejudy@gmail.com</a>&gt;</span> =
wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex"><div dir=3D"ltr"><span style=3D"font-family:=
arial,sans-serif;font-size:14px">Hi,</span><br style=3D"font-family:arial,s=
ans-serif;font-size:14px">
<br style=3D"font-family:arial,sans-serif;font-size:14px"><span style=3D"fo=
nt-family:arial,sans-serif;font-size:14px">We have a large data set origina=
lly stored on MS SQL and for intensive data aggregation manipulation, we&rs=
quo;re currently using Vertica. The thing is the data is very large and som=
etimes, a &ldquo;select&rdquo; or &ldquo;insert&rdquo; query which is very =
complex may needs even 10 minutes to return the correct results. (the datab=
ase size is maybe 2GB)</span><br style=3D"font-family:arial,sans-serif;font=
-size:14px">

<br style=3D"font-family:arial,sans-serif;font-size:14px"><span style=3D"fo=
nt-family:arial,sans-serif;font-size:14px">So we&rsquo;re thinking whether =
we can use Hadoop together with some other Apache Products (built on hadoop=
) to make the query faster.</span><br style=3D"font-family:arial,sans-serif=
;font-size:14px">

<span style=3D"font-family:arial,sans-serif;font-size:14px">For example, if=
 we can use Hadoop &amp; HBase &amp; ZooKeeper and write MR functions for t=
hese &ldquo;SELECT&rdquo; &ldquo;INSERT&rdquo; or complex queries like that=
 to improve the query speed?</span><br style=3D"font-family:arial,sans-seri=
f;font-size:14px">

<br style=3D"font-family:arial,sans-serif;font-size:14px"><span style=3D"fo=
nt-family:arial,sans-serif;font-size:14px">Also, I don&rsquo;t know if the =
combination I listed above is a good one, should I use Hadoop, HBase and Zo=
oKeepr or should I use Hadoop, Pig and Hive?</span><br style=3D"font-family=
:arial,sans-serif;font-size:14px">

<br style=3D"font-family:arial,sans-serif;font-size:14px"><span style=3D"fo=
nt-family:arial,sans-serif;font-size:14px">My question is mainly a &ldquo;S=
QL-on-Hadoop&rdquo; thing, would please tell me if it&rsquo;s possible and =
if so, would you give me some suggestions? I do appreciate it a lot !</span=
><br style=3D"font-family:arial,sans-serif;font-size:14px">

<br style=3D"font-family:arial,sans-serif;font-size:14px"><br style=3D"font=
-family:arial,sans-serif;font-size:14px"><span style=3D"font-family:arial,s=
ans-serif;font-size:14px">Thanks.</span><br style=3D"font-family:arial,sans=
-serif;font-size:14px">

<br style=3D"font-family:arial,sans-serif;font-size:14px"><span style=3D"fo=
nt-family:arial,sans-serif;font-size:14px">Best</span><br style=3D"font-fam=
ily:arial,sans-serif;font-size:14px"><span style=3D"font-family:arial,sans-=
serif;font-size:14px">Judy</span><br>

</div>
</blockquote></div><br></div>

--001a11c2cbaa9afe5004fc1e0f10--