Return-Path: X-Original-To: apmail-hive-user-archive@www.apache.org Delivered-To: apmail-hive-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 8561CD3E7 for ; Mon, 24 Dec 2012 14:23:27 +0000 (UTC) Received: (qmail 66688 invoked by uid 500); 24 Dec 2012 14:23:26 -0000 Delivered-To: apmail-hive-user-archive@hive.apache.org Received: (qmail 66552 invoked by uid 500); 24 Dec 2012 14:23:26 -0000 Mailing-List: contact user-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hive.apache.org Delivered-To: mailing list user@hive.apache.org Received: (qmail 66524 invoked by uid 99); 24 Dec 2012 14:23:24 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 24 Dec 2012 14:23:24 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of jeremiah.peschka@gmail.com designates 209.85.220.174 as permitted sender) Received: from [209.85.220.174] (HELO mail-vc0-f174.google.com) (209.85.220.174) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 24 Dec 2012 14:23:18 +0000 Received: by mail-vc0-f174.google.com with SMTP id d16so7501387vcd.33 for ; Mon, 24 Dec 2012 06:22:57 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=aVIdsd+NnriIs1jvCgsGDMqUsCDullwZcyLRScv55Lk=; b=FBpwAcTID7X+M6lJIC0sFhAHsMoInTEtFvx0AwsVNHLLpBUXhYFHyj1LZkAnicaiqE KDooHdBRN8s8l1J2onpEyuxg3+Ov2EuFzAtEfUCnJogO0WK+pNlFOAPRv9aCuXE0chyM 6G5Y6IKnyUx/3MVUC+ptoWZAVz9qOMmiORpBHHFxKlzT0waOmsx5adNntj/C5+n5P6a7 jH59KCeaknZUdPKDhbr8qTlbVoN7OXnJ5AAgMd+ksRwfguYP7kOUjfS5B2CPIqnad+YQ XVc7RZVoqD8p+6orB8fudSASZrh/KnIszPWihuz9ll0ff2amz4A6T/AEhtr239g6bz73 YJZg== MIME-Version: 1.0 Received: by 10.220.40.135 with SMTP id k7mr32544291vce.12.1356358977607; Mon, 24 Dec 2012 06:22:57 -0800 (PST) Received: by 10.220.10.12 with HTTP; Mon, 24 Dec 2012 06:22:57 -0800 (PST) In-Reply-To: References: Date: Mon, 24 Dec 2012 06:22:57 -0800 Message-ID: Subject: Re: Reflect MySQL updates into Hive From: Jeremiah Peschka To: "user@hive.apache.org" Content-Type: multipart/alternative; boundary=bcaec54ee8120c57b204d199ef4b X-Virus-Checked: Checked by ClamAV on apache.org --bcaec54ee8120c57b204d199ef4b Content-Type: text/plain; charset=ISO-8859-1 If it were me, I would find a way to identify the partitions that have modified data and then re-load a subset of the partitions (only the ones with changes) on a regular basis. Instead of updating/deleting data, you'll be re-loading specific partitions as an all or nothing action. On Monday, December 24, 2012, Ibrahim Yakti wrote: > This already done, but Hive does not support update nor deletion of data, > so when I import the data after specific "last_update_time" records, hive > will append it not replace. > > > -- > Ibrahim > > > On Mon, Dec 24, 2012 at 5:03 PM, Mohammad Tariq wrote: > > You can use Apache Oozie to schedule your imports. > > Alternatively, you can have an additional column in your SQL table, say > LastUpdatedTime or something. As soon as there is a change in this column > you can start the import from this point. This way you don't have to import > all the things everytime there is a change in your table. You just have to > move only the most recent data, say only the 'delta' amount of data. > > Best Regards, > Tariq > +91-9741563634 > https://mtariq.jux.com/ > > > On Mon, Dec 24, 2012 at 7:08 PM, Ibrahim Yakti wrote: > > My question was how to reflect MySQL updates to hadoop/hive, this is our > problem now. > > > -- > Ibrahim > > > On Mon, Dec 24, 2012 at 4:35 PM, Mohammad Tariq wrote: > > Cool. Then go ahead :) > > Just in case you need something in realtime, you can have a look at > Impala.(I know nobody likes to get preached, but just in case ;) ). > > Best Regards, > Tariq > +91-9741563634 > https://mtariq.jux.com/ > > > On Mon, Dec 24, 2012 at 7:00 PM, Ibrahim Yakti wrote: > > Thanks Mohammad, No, we do not have any plans to replace our RDBMS with > Hive. Hadoop/Hive will be used as Data Warehouse & batch processing > computing, as I said we want to use Hive for analytical queries. > > > -- > Ibrahim > > > On Mon, Dec 24, 2012 at 4:19 PM, Mohammad Tariq wrote: > > Hello Ibrahim, > > A quick questio. Are you planning to replace your SQL DB with Hive? > If that is the case, I would not suggest to do that. Both are meant for > entirely different purposes. Hive is for batch processing and not for real > time system. So if you are requirements involve real time things, you need > to think before moving ahead. > > Yes, Sqoop is 'the' tool. It is primarily meant for this purpose. > > HTH > > Best Regards, > Tariq > +91-9741563634 > https://mtariq.jux.com/ > > > On Mon, Dec 24, 2012 at 6:38 PM, Ibrahim Yakti wrote: > > Hi All, > > We are new to hadoop and hive, we are trying to use hive to > run analytical queries and we are using sqoop to import data into hive, in > our RDBMS the data updated very frequently and this needs to be reflected > to hive. Hive does not support update/delete but there are many workarounds > to do this task. > > What's in our mind is importing all the > > -- --- Jeremiah Peschka Founder, Brent Ozar Unlimited Microsoft SQL Server MVP --bcaec54ee8120c57b204d199ef4b Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable If it were me, I would find a way to identify the partitions=A0that have mo= dified data and then re-load a subset of=A0the partitions (only the ones wi= th changes)=A0on a regular basis. Instead of updating/deleting data, you= 9;ll be re-loading specific partitions as an all or nothing action.

On Monday, December 24, 2012, Ibrahim Yakti wrote:
This already done, but Hive does not suppor= t update nor deletion of data, so when I import the data after specific &qu= ot;last_update_time" records, hive will append it not replace.


--=
Ibrahim


On Mon, Dec 24, 2012 at 5:03 PM, Mohammad Tariq <dontariq@gmail.com> wrote:
You can use Apache Oozie to schedule your imports.=A0
=
Alternatively, you can have an additional column in your SQL= table, say LastUpdatedTime or something. As soon as there is a change in t= his column you can start the import from this point. This way you don't= have to import all the things everytime there is a change in your table. Y= ou just have to move only the most recent data, say only the 'delta'= ; amount of data.

Best Regards,
Tariq
+91-9741563634


On Mon, Dec 24, 2012 at 7:08 PM, Ibrahim Yakti= <iyakti@souq.com> wrote:
My question was how to reflect MySQL updates to hadoop/hiv= e, this is our problem now.


--
Ibrahim


On Mon, Dec 24, 2012 at 4:35 PM, Mohammad Tariq <dontariq@gmail.com> wrote:
Cool. Then go ahead :)

Just in case you= need something in realtime, you can have a look at Impala.(I know nobody l= ikes to get preached, but just in case ;) ).

Best Regards,
Tariq
+= 91-9741563634


On Mon, Dec 24, 2012 at 7:00 PM, Ibrahim Yakti= <iyakti@souq.com> wrote:
Thanks Mohammad, No, we do not have any plans to replace o= ur RDBMS with Hive. Hadoop/Hive will be used as Data Warehouse & batch = processing computing, as I said we want to use Hive for analytical queries.=


--
Ibrahim


On Mon, Dec 24, 2012 at 4:19 PM, Mohammad Tariq <dontariq@gmail.com> wrote:
Hello Ibrahim,

=A0 =A0 =A0A quick quest= io. Are you planning to replace your SQL DB with Hive? If that is the case,= I would not suggest to do that. Both are meant for entirely different purp= oses. Hive is for batch processing and not for real time system. So if you = are requirements involve real time things, you need to think before moving = ahead.

Yes, Sqoop is 'the' tool. It is primarily meant= for this purpose.

HTH

Best Regards,
Tariq
+91-9741563634
<= div>

On Mon, Dec 24, 2012 at 6:38 PM, Ibrahim Yakti <iyakti@souq.com> wrote:
Hi All,

We are new to hadoop and hive, = we are trying to use hive to run=A0analytical=A0queries and we are using sq= oop to import data into hive, in our RDBMS the data updated very frequently= and this needs to be reflected to hive. Hive does not support update/delet= e but there are many workarounds to do this task.

What's in our mind is importing all the
=


--
---
Jeremiah Peschka
Founder, Brent Ozar = Unlimited
Microsoft S= QL Server MVP

--bcaec54ee8120c57b204d199ef4b--