Return-Path: X-Original-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 2FEB81122C for ; Wed, 18 Jun 2014 19:18:27 +0000 (UTC) Received: (qmail 41814 invoked by uid 500); 18 Jun 2014 19:18:23 -0000 Delivered-To: apmail-hadoop-mapreduce-user-archive@hadoop.apache.org Received: (qmail 41703 invoked by uid 500); 18 Jun 2014 19:18:23 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Delivered-To: moderator for user@hadoop.apache.org Received: (qmail 11379 invoked by uid 99); 18 Jun 2014 15:39:32 -0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of cristobalgc@gmail.com designates 209.85.219.54 as permitted sender) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=NaG7n57s6rYsDhtUZlIuQk6j9W8hgsyihwHGPLPGeoI=; b=FhmLToL0WfUzjCPz7AbQPisGdsnFIlUeKCPBQu0+2V+q2LS5ATsV8ZSSMgygbcCMLl WJj5TnxDRl7rQ6pFCghfUxN0b93SmOx+qecIWYqVrxPsXsq5PYD5iR3odbOeRBcGrd/N NvKBMj7hZ7Q2Y0hJLgSDqqRN90Th3cVEFFqHyNr1ANuRXUwUSyr+MMXuUHjHPadhwQiP +T/wR5MCBpp4WJJ0Gn6CBJc1EAHsdkDXtYc9gTJNL+cD5Q97KLphUKvZSnLZHfUKSy/O 6Y4jSB0ZqdnG0Nj41a6tsFn0ClkP3Dw+qBfNoquN202v6NR3Fu2A9fTcYOYvCulb/vi+ 9ksQ== MIME-Version: 1.0 X-Received: by 10.182.200.169 with SMTP id jt9mr2899211obc.0.1403105947875; Wed, 18 Jun 2014 08:39:07 -0700 (PDT) In-Reply-To: References: Date: Wed, 18 Jun 2014 11:39:07 -0400 Message-ID: Subject: Re: Use Hadoop and other Apache products for SQL query manipulations From: =?ISO-8859-1?Q?Crist=F3bal_Giadach?= To: user@hadoop.apache.org Content-Type: multipart/alternative; boundary=001a11c2cbaa9afe5004fc1e0f10 X-Virus-Checked: Checked by ClamAV on apache.org --001a11c2cbaa9afe5004fc1e0f10 Content-Type: text/plain; charset=ISO-8859-1 Try impala or Hawk( http://www.gopivotal.com/sites/default/files/Hawq_WP_042313_FINAL.pdf), in my opinion the best choice for SQL-on-Hadoop. On Wed, Jun 18, 2014 at 11:26 AM, Fengjiao Jiang wrote: > Hi, > > We have a large data set originally stored on MS SQL and for intensive > data aggregation manipulation, we're currently using Vertica. The thing is > the data is very large and sometimes, a "select" or "insert" query which is > very complex may needs even 10 minutes to return the correct results. (the > database size is maybe 2GB) > > So we're thinking whether we can use Hadoop together with some other > Apache Products (built on hadoop) to make the query faster. > For example, if we can use Hadoop & HBase & ZooKeeper and write MR > functions for these "SELECT" "INSERT" or complex queries like that to > improve the query speed? > > Also, I don't know if the combination I listed above is a good one, should > I use Hadoop, HBase and ZooKeepr or should I use Hadoop, Pig and Hive? > > My question is mainly a "SQL-on-Hadoop" thing, would please tell me if > it's possible and if so, would you give me some suggestions? I do > appreciate it a lot ! > > > Thanks. > > Best > Judy > --001a11c2cbaa9afe5004fc1e0f10 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
Try impala or Hawk(http://www.gopivotal.com/sites/d= efault/files/Hawq_WP_042313_FINAL.pdf), in my opinion the best choice f= or SQL-on-Hadoop. 


On Wed, Jun 1= 8, 2014 at 11:26 AM, Fengjiao Jiang <grapejudy@gmail.com> = wrote:
Hi,

We have a large data set origina= lly stored on MS SQL and for intensive data aggregation manipulation, we&rs= quo;re currently using Vertica. The thing is the data is very large and som= etimes, a “select” or “insert” query which is very = complex may needs even 10 minutes to return the correct results. (the datab= ase size is maybe 2GB)

So we’re thinking whether = we can use Hadoop together with some other Apache Products (built on hadoop= ) to make the query faster.
For example, if= we can use Hadoop & HBase & ZooKeeper and write MR functions for t= hese “SELECT” “INSERT” or complex queries like that= to improve the query speed?

Also, I don’t know if the = combination I listed above is a good one, should I use Hadoop, HBase and Zo= oKeepr or should I use Hadoop, Pig and Hive?

My question is mainly a “S= QL-on-Hadoop” thing, would please tell me if it’s possible and = if so, would you give me some suggestions? I do appreciate it a lot !


Thanks.

Best
Judy

--001a11c2cbaa9afe5004fc1e0f10--