Return-Path: X-Original-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 4A908116E6 for ; Wed, 18 Jun 2014 15:27:01 +0000 (UTC) Received: (qmail 69264 invoked by uid 500); 18 Jun 2014 15:26:57 -0000 Delivered-To: apmail-hadoop-hdfs-user-archive@hadoop.apache.org Received: (qmail 69121 invoked by uid 500); 18 Jun 2014 15:26:57 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 69097 invoked by uid 99); 18 Jun 2014 15:26:57 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 18 Jun 2014 15:26:57 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of grapejudy@gmail.com designates 209.85.213.172 as permitted sender) Received: from [209.85.213.172] (HELO mail-ig0-f172.google.com) (209.85.213.172) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 18 Jun 2014 15:26:51 +0000 Received: by mail-ig0-f172.google.com with SMTP id hn18so2348386igb.11 for ; Wed, 18 Jun 2014 08:26:31 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:date:message-id:subject:from:to:content-type; bh=Ae4DUapkS6x6T2/TcpaqPbFiaf2l/CShwdGS3zyOnqk=; b=T9h3WeQkhSn86Sl4j6jOg/T4cSQ5sn+e9yY0PT10qu548ByOyysFBf3Q3Yyoa7ere0 lIungTj3dZBOsRXbyLCPbXRDGoNU4TaYWAeRal4E5edYMMUsmiR9/9UKtNmG6zoxuSK2 HdRz9XtXTW3tynoNgwIVteYX9I3nQ4853ucPvSYXW5BRQLPKRGxmJRwENIUvT/eZJkgx XLq/NZ6x4CBaJWwNar0HgXuiYSz7Conw0xfy1aiFWeJF8F/QWtNhF9qj4yVzn2bDa2am tQACuk8NwNmfe13yG9LSISzBZLSjnqngkowBF+eya7/T7GObU60HkgEt4WE9z5TFZhkj hOUA== MIME-Version: 1.0 X-Received: by 10.50.55.66 with SMTP id q2mr6021389igp.11.1403105190931; Wed, 18 Jun 2014 08:26:30 -0700 (PDT) Received: by 10.43.166.4 with HTTP; Wed, 18 Jun 2014 08:26:30 -0700 (PDT) Date: Wed, 18 Jun 2014 11:26:30 -0400 Message-ID: Subject: Use Hadoop and other Apache products for SQL query manipulations From: Fengjiao Jiang To: user@hadoop.apache.org Content-Type: multipart/alternative; boundary=089e010d9bda7cf05204fc1de26d X-Virus-Checked: Checked by ClamAV on apache.org --089e010d9bda7cf05204fc1de26d Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Hi, We have a large data set originally stored on MS SQL and for intensive data aggregation manipulation, we=E2=80=99re currently using Vertica. The thing = is the data is very large and sometimes, a =E2=80=9Cselect=E2=80=9D or =E2=80=9Cin= sert=E2=80=9D query which is very complex may needs even 10 minutes to return the correct results. (the database size is maybe 2GB) So we=E2=80=99re thinking whether we can use Hadoop together with some othe= r Apache Products (built on hadoop) to make the query faster. For example, if we can use Hadoop & HBase & ZooKeeper and write MR functions for these =E2=80=9CSELECT=E2=80=9D =E2=80=9CINSERT=E2=80=9D or co= mplex queries like that to improve the query speed? Also, I don=E2=80=99t know if the combination I listed above is a good one,= should I use Hadoop, HBase and ZooKeepr or should I use Hadoop, Pig and Hive? My question is mainly a =E2=80=9CSQL-on-Hadoop=E2=80=9D thing, would please= tell me if it=E2=80=99s possible and if so, would you give me some suggestions? I do appreciate it a lot ! Thanks. Best Judy --089e010d9bda7cf05204fc1de26d Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Hi,

We have a large data set originally s= tored on MS SQL and for intensive data aggregation manipulation, we=E2=80= =99re currently using Vertica. The thing is the data is very large and some= times, a =E2=80=9Cselect=E2=80=9D or =E2=80=9Cinsert=E2=80=9D query which i= s very complex may needs even 10 minutes to return the correct results. (th= e database size is maybe 2GB)

So we=E2=80=99re thinking whethe= r we can use Hadoop together with some other Apache Products (built on hado= op) to make the query faster.
For example, if= we can use Hadoop & HBase & ZooKeeper and write MR functions for t= hese =E2=80=9CSELECT=E2=80=9D =E2=80=9CINSERT=E2=80=9D or complex queries l= ike that to improve the query speed?

Also, I don=E2=80=99t know if th= e combination I listed above is a good one, should I use Hadoop, HBase and = ZooKeepr or should I use Hadoop, Pig and Hive?

My question is mainly a =E2=80= =9CSQL-on-Hadoop=E2=80=9D thing, would please tell me if it=E2=80=99s possi= ble and if so, would you give me some suggestions? I do appreciate it a lot= !


Thanks.

Best
Judy
--089e010d9bda7cf05204fc1de26d--