Return-Path: X-Original-To: apmail-hive-user-archive@www.apache.org Delivered-To: apmail-hive-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 2CE6DD3C5 for ; Mon, 24 Dec 2012 14:09:36 +0000 (UTC) Received: (qmail 45714 invoked by uid 500); 24 Dec 2012 14:09:34 -0000 Delivered-To: apmail-hive-user-archive@hive.apache.org Received: (qmail 45614 invoked by uid 500); 24 Dec 2012 14:09:34 -0000 Mailing-List: contact user-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hive.apache.org Delivered-To: mailing list user@hive.apache.org Received: (qmail 45604 invoked by uid 99); 24 Dec 2012 14:09:34 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 24 Dec 2012 14:09:34 +0000 X-ASF-Spam-Status: No, hits=2.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_SOFTFAIL X-Spam-Check-By: apache.org Received-SPF: softfail (nike.apache.org: transitioning domain of iyakti@souq.com does not designate 209.85.215.42 as permitted sender) Received: from [209.85.215.42] (HELO mail-la0-f42.google.com) (209.85.215.42) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 24 Dec 2012 14:09:27 +0000 Received: by mail-la0-f42.google.com with SMTP id s15so8643912lag.29 for ; Mon, 24 Dec 2012 06:09:07 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type:x-gm-message-state; bh=NozF8oQTTvbqe7IjUF+iA6pbOkfJpBqWSljJqfEw5sk=; b=RJMwTHywCg7BeOq69ze/IfPaXQZIOka8DfdHVwpnV+HYJfVC1SP8Arb1zbNl3BAOzR 5zntrFQu0g+V/QmtRHvDg/PyLldSVfx5QvYuYC06+Vam+WojoFRLzk1fPD9fLph+4oc5 xJ+A3aRXeuH0EEBMjssOmoht58gDayufD68MDi9JVvSXlJn6xR/AoRQ95F6eorSIWfWf g5uXA2oyp0WJrWbQWsk1NPw0ekMbAv3HzdnAiTBq+LMeAg/eSfkIYJKbiOAl/2rKbGFh tVdK4dKOw5jQa7J+Lv38Bu3i4u34ApUXCya9tHjCU5Bp+4BOgkyzzhg4KTGc32Y28OXX +kTQ== Received: by 10.152.105.203 with SMTP id go11mr20141678lab.53.1356358146904; Mon, 24 Dec 2012 06:09:06 -0800 (PST) MIME-Version: 1.0 Received: by 10.114.13.201 with HTTP; Mon, 24 Dec 2012 06:08:26 -0800 (PST) In-Reply-To: References: From: Ibrahim Yakti Date: Mon, 24 Dec 2012 17:08:26 +0300 Message-ID: Subject: Re: Reflect MySQL updates into Hive To: user Content-Type: multipart/alternative; boundary=f46d04083e3388d18e04d199bd3f X-Gm-Message-State: ALoCoQkHNW1cEhtOrDSrjDarM1+GL07zYgF8v5vxa1om9xhphcI7mTj74QMeH5bO452tJS7TYb55 X-Virus-Checked: Checked by ClamAV on apache.org --f46d04083e3388d18e04d199bd3f Content-Type: text/plain; charset=UTF-8 This already done, but Hive does not support update nor deletion of data, so when I import the data after specific "last_update_time" records, hive will append it not replace. -- Ibrahim On Mon, Dec 24, 2012 at 5:03 PM, Mohammad Tariq wrote: > You can use Apache Oozie to schedule your imports. > > Alternatively, you can have an additional column in your SQL table, say > LastUpdatedTime or something. As soon as there is a change in this column > you can start the import from this point. This way you don't have to import > all the things everytime there is a change in your table. You just have to > move only the most recent data, say only the 'delta' amount of data. > > Best Regards, > Tariq > +91-9741563634 > https://mtariq.jux.com/ > > > On Mon, Dec 24, 2012 at 7:08 PM, Ibrahim Yakti wrote: > >> My question was how to reflect MySQL updates to hadoop/hive, this is our >> problem now. >> >> >> -- >> Ibrahim >> >> >> On Mon, Dec 24, 2012 at 4:35 PM, Mohammad Tariq wrote: >> >>> Cool. Then go ahead :) >>> >>> Just in case you need something in realtime, you can have a look at >>> Impala.(I know nobody likes to get preached, but just in case ;) ). >>> >>> Best Regards, >>> Tariq >>> +91-9741563634 >>> https://mtariq.jux.com/ >>> >>> >>> On Mon, Dec 24, 2012 at 7:00 PM, Ibrahim Yakti wrote: >>> >>>> Thanks Mohammad, No, we do not have any plans to replace our RDBMS with >>>> Hive. Hadoop/Hive will be used as Data Warehouse & batch processing >>>> computing, as I said we want to use Hive for analytical queries. >>>> >>>> >>>> -- >>>> Ibrahim >>>> >>>> >>>> On Mon, Dec 24, 2012 at 4:19 PM, Mohammad Tariq wrote: >>>> >>>>> Hello Ibrahim, >>>>> >>>>> A quick questio. Are you planning to replace your SQL DB with >>>>> Hive? If that is the case, I would not suggest to do that. Both are meant >>>>> for entirely different purposes. Hive is for batch processing and not for >>>>> real time system. So if you are requirements involve real time things, you >>>>> need to think before moving ahead. >>>>> >>>>> Yes, Sqoop is 'the' tool. It is primarily meant for this purpose. >>>>> >>>>> HTH >>>>> >>>>> Best Regards, >>>>> Tariq >>>>> +91-9741563634 >>>>> https://mtariq.jux.com/ >>>>> >>>>> >>>>> On Mon, Dec 24, 2012 at 6:38 PM, Ibrahim Yakti wrote: >>>>> >>>>>> Hi All, >>>>>> >>>>>> We are new to hadoop and hive, we are trying to use hive to >>>>>> run analytical queries and we are using sqoop to import data into hive, in >>>>>> our RDBMS the data updated very frequently and this needs to be reflected >>>>>> to hive. Hive does not support update/delete but there are many workarounds >>>>>> to do this task. >>>>>> >>>>>> What's in our mind is importing all the tables into hive as is, then >>>>>> we build the required tables for reporting. >>>>>> >>>>>> My questions are: >>>>>> >>>>>> 1. What is the best way to reflect MySQL updates into Hive with >>>>>> minimal resources? >>>>>> 2. Is sqoop the right tool to do the ETL? >>>>>> 3. Is Hive the right tool to do this kind of queries or we should >>>>>> search for alternatives? >>>>>> >>>>>> Any hint will be useful, thanks in advanced. >>>>>> >>>>>> -- >>>>>> Ibrahim >>>>>> >>>>> >>>>> >>>> >>> >> > --f46d04083e3388d18e04d199bd3f Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
This already done, but Hive does not support update nor de= letion of data, so when I import the data after specific "last_update_= time" records, hive will append it not replace.


--=
Ibrahim


On Mon, Dec 24, 2012 at 5:03 PM, Mohamma= d Tariq <dontariq@gmail.com> wrote:
You can use Apache Oozie to schedule your imports.=C2=A0
Alternatively, you can have an additional column in your = SQL table, say LastUpdatedTime or something. As soon as there is a change i= n this column you can start the import from this point. This way you don= 9;t have to import all the things everytime there is a change in your table= . You just have to move only the most recent data, say only the 'delta&= #39; amount of data.

Best Regards,
Tariq
+91-9741563634


On Mon, Dec= 24, 2012 at 7:08 PM, Ibrahim Yakti <iyakti@souq.com> wrote:
My question was how to reflect MySQL updates to hadoop/hiv= e, this is our problem now.


--
Ibrahim


On Mon, Dec 24, 2012 at 4:35 PM, Mohamma= d Tariq <dontariq@gmail.com> wrote:
Cool. Then go ahead :)

Just in case you= need something in realtime, you can have a look at Impala.(I know nobody l= ikes to get preached, but just in case ;) ).

Best Regards,
Tariq
+= 91-9741563634


On Mon, Dec 24, 2012 at = 7:00 PM, Ibrahim Yakti <iyakti@souq.com> wrote:
Thanks Mohammad, No, we do not have any plans to replace o= ur RDBMS with Hive. Hadoop/Hive will be used as Data Warehouse & batch = processing computing, as I said we want to use Hive for analytical queries.=


--
Ibrahi= m


On Mon, Dec 24, 2012 at 4:19 PM, Mohamma= d Tariq <dontariq@gmail.com> wrote:
Hello Ibrahim,

=C2=A0 =C2=A0 =C2=A0A qu= ick questio. Are you planning to replace your SQL DB with Hive? If that is = the case, I would not suggest to do that. Both are meant for entirely diffe= rent purposes. Hive is for batch processing and not for real time system. S= o if you are requirements involve real time things, you need to think befor= e moving ahead.

Yes, Sqoop is 'the' tool. It is primarily meant= for this purpose.

HTH

Best Regards,
Tariq
+91-9741563634
<= div>

On Mon, Dec 24, 2012 at 6:38 PM, Ibrahim= Yakti <iyakti@souq.com> wrote:
Hi All,

We are new to hadoop and hive, = we are trying to use hive to run=C2=A0analytical=C2=A0queries and we are us= ing sqoop to import data into hive, in our RDBMS the data updated very freq= uently and this needs to be reflected to hive. Hive does not support update= /delete but there are many workarounds to do this task.

What's in our mind is importing all the tables into= hive as is, then we build the required tables for reporting.
My questions are:=C2=A0
  1. What is the best way = to reflect MySQL updates into Hive with minimal resources?
  2. Is sqoop the right tool to do the ETL?
  3. Is Hive the right to= ol to do this kind of queries or we should search for alternatives?
  4. Any hint will be useful, thanks in advanced.

    --
    Ibrahim






--f46d04083e3388d18e04d199bd3f--