Return-Path: X-Original-To: apmail-hive-user-archive@www.apache.org Delivered-To: apmail-hive-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 4A7E5184A9 for ; Sat, 19 Dec 2015 00:42:56 +0000 (UTC) Received: (qmail 26214 invoked by uid 500); 19 Dec 2015 00:42:54 -0000 Delivered-To: apmail-hive-user-archive@hive.apache.org Received: (qmail 26143 invoked by uid 500); 19 Dec 2015 00:42:54 -0000 Mailing-List: contact user-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hive.apache.org Delivered-To: mailing list user@hive.apache.org Received: (qmail 26133 invoked by uid 99); 19 Dec 2015 00:42:54 -0000 Received: from Unknown (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 19 Dec 2015 00:42:54 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 8F585C91E8 for ; Sat, 19 Dec 2015 00:42:53 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 3.388 X-Spam-Level: *** X-Spam-Status: No, score=3.388 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_ENVFROM_END_DIGIT=0.25, FREEMAIL_REPLYTO_END_DIGIT=0.25, HTML_MESSAGE=3, RCVD_IN_MSPIKE_H2=-0.001, SPF_PASS=-0.001, T_RP_MATCHES_RCVD=-0.01] autolearn=disabled Authentication-Results: spamd1-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=yahoo.com Received: from mx1-us-east.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id M8m40IMR3xD3 for ; Sat, 19 Dec 2015 00:42:49 +0000 (UTC) Received: from nm20-vm5.bullet.mail.ir2.yahoo.com (nm20-vm5.bullet.mail.ir2.yahoo.com [212.82.96.247]) by mx1-us-east.apache.org (ASF Mail Server at mx1-us-east.apache.org) with ESMTPS id B1D89439E6 for ; Sat, 19 Dec 2015 00:42:48 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s2048; t=1450485762; bh=PhQc5HQhGRVvzS0K23PCgPhy8TYysA3BtHIao6fMksY=; h=Date:From:Reply-To:To:In-Reply-To:References:Subject:From:Subject; b=qUL3ne9Suc0N+KX/Zv88hASeOxnmAVfX543/Nq59nfOMlo06kXQLeourASv9cMjy/6W5bfRRA0uWkOfpUjB7F3Yj2snXghTGH+djTv/UzUE54frbhnowu9MVK095U5HX2JAVFKjLCW/s6kzVBsVQVGCrJIK0SgL1gQ9QOyTAzSI4C6oN1NCzWMHFnjutGeGoM+Ww7kNOIr6kd60AEU0i0fZroKet6YeTnmvg5IVYN+hTYvaZfMd3UUwEWnBnxbHflqSw7QK9VRc+8FvSg6KuYUqDx9IDFfSpCiM77duBz7ddESTiTeva0JSkU9lcmdn4qeElEVt2MjofAeobzlZRjw== Received: from [212.82.98.61] by nm20.bullet.mail.ir2.yahoo.com with NNFMP; 19 Dec 2015 00:42:42 -0000 Received: from [212.82.98.64] by tm14.bullet.mail.ir2.yahoo.com with NNFMP; 19 Dec 2015 00:42:42 -0000 Received: from [127.0.0.1] by omp1001.mail.ir2.yahoo.com with NNFMP; 19 Dec 2015 00:42:42 -0000 X-Yahoo-Newman-Property: ymail-3 X-Yahoo-Newman-Id: 69192.75213.bm@omp1001.mail.ir2.yahoo.com X-YMail-OSG: kZU6AFsVM1n6C6MFLo5_G73fTuu4UL4EuhQ0c3yj1WEX0dYx4xOPSre_cTl3Lwa LTx653Wrnyh7PdvnsvC.HvdS_VptVK_DXMoPtKFVhcGbWFKbuVI81PeyDS6Z8DWfbNCvxVTR.t1s 47FYKAlN_JWDhZ5YX7k2HfvG.LGBJk2Jm7GXIsJ2zb6IAY_jn.19uqAwcgv21AhuIKqA0o5q3H4k nYtwePnYiCcuU7ixBek.ymr2disJNU7UCJRs7bw76Wop3oGfedYzNva9hP5o9urvasAhb1PYVk2K 28t2Ez5Zdc9ejcUhK3UEp1z.RC6Mt6AuD4wHIv4M8f_XW1v7PYDwlcSZdx605RAg33gO3x7bC8Ww 4732tn8cQZHPb7SHclMCaATtBhbWWtkLTAPv39Kq7KFJtliN9RwrmC9TXwWlJKmTBbNrKrnPs3MN BqDcNBXmNSKaTAxByQ6Lw6nKB6my._R.JGs8D5JE1EAzcK1Tr.eM23ydHkPtS8efQfJ7LvDwB8e5 1V4Xf8geeYk64H_gjhTlYzlzp1ebeUF0- Received: by 212.82.98.122; Sat, 19 Dec 2015 00:42:41 +0000 Date: Sat, 19 Dec 2015 00:42:41 +0000 (UTC) From: Ashok Kumar Reply-To: Ashok Kumar To: "user@hive.apache.org" Message-ID: <1296963063.1525497.1450485761081.JavaMail.yahoo@mail.yahoo.com> In-Reply-To: References: Subject: Re: The advantages of Hive/Hadoop comnpared to Data Warehouse MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_Part_1525496_1409588008.1450485761076" ------=_Part_1525496_1409588008.1450485761076 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Hi, Thanks for the info. I understand ELT (Extract, Load, Transform) is more ap= propriate for big data compared to traditional ETL. What are the major adva= ntages of this in Big Data space. Example. if I started using Sqoop to get data from traditional transactiona= l and Data Warehouse databases and create the same tables in Hive, what wou= ld be the next step to get to a consolidated data model in Hive on HDFS. Th= e entry tables will be tabular tables in line with source, correct?=C2=A0Ho= w many ELT steps need to apply generally to get to the final model. Will EL= T speed up this process I understand this is a very broad question. However,=C2=A0any comments will= be welcome. Regards =20 On Friday, 18 December 2015, 22:27, J=C3=B6rn Franke wrote: =20 I think you should draw more the attention that Hive is just one component= in the ecosystem. You can have many more components, such as ELT, integrat= ing unstructured data, machine learning, streaming data etc. however usuall= y analysts are not aware about the technologies and it staff is not much aw= are of how it can bring benefits to a specific business domain. You could e= xplore the potentials together in workshops, design thinking etc. once you = know more details, both sides decide on potential ways forward you can star= t doing PoCs and see what works and what not. It is important that you brea= k old ties created by more traditional data warehouse approaches in the pas= t and go beyond the comfort zone. On 18 Dec 2015, at 22:01, Ashok Kumar wrote: Gurus, Some analysts keep asking me the advantages of having Hive tables when the = star schema in Data Warehouse (DW) does the same. For example if you have fact and dimensions table in DW and just import the= m into Hive via a say SQOOP, what are we going to gain. I keep telling them storage economy and cheap disks, de-normalisation can b= e done further etc. However, they are not convinced :( Any additional comments will help my case. Thanks a lot ------=_Part_1525496_1409588008.1450485761076 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Hi,

Thanks for the info. I understand ELT (Extract, Load, Tr= ansform) is more appropriate for big data compared to traditional ETL. What= are the major advantages of this in Big Data space.

Example. if I started using Sqoop to get data= from traditional transactional and Data Warehouse databases and create the= same tables in Hive, what would be the next step to get to a consolidated = data model in Hive on HDFS. The entry tables will be tabular tables in line= with source, correct? How many ELT steps need to apply generally to g= et to the final model. Will ELT speed up this process

I understand this is a very broad question. = However, any comments will be welcome.

Regards


<= br>
On Frid= ay, 18 December 2015, 22:27, J=C3=B6rn Franke <jornfranke@gmail.com> = wrote:


I think= you should draw more the attention that Hive is just one component in the = ecosystem. You can have many more components, such as ELT, integrating unst= ructured data, machine learning, streaming data etc. however usually analys= ts are not aware about the technologies and it staff is not much aware of h= ow it can bring benefits to a specific business domain. You could explore t= he potentials together in workshops, design thinking etc. once you know mor= e details, both sides decide on potential ways forward you can start doing = PoCs and see what works and what not. It is important that you break old ti= es created by more traditional data warehouse approaches in the past and go= beyond the comfort zone.

On 18 Dec 2015, at 22:01, Ashok Kumar <ashok34668@yahoo.com> wrote:
=
Gurus,

Some analysts keep asking me the advantages of having Hi= ve tables when the star schema in Data Warehouse (DW) does the same.
<= div id=3D"yiv4951905797yui_3_16_0_1_1450472258106_2082">
=
For example = if you have fact and dimensions table in DW and just import them into Hive = via a say SQOOP, what are we going to gain.

I keep telling them storage economy a= nd cheap disks, de-normalisation can be done further etc. However, they are= not convinced :(

Any additional comments will help my case.

Thanks a lot


------=_Part_1525496_1409588008.1450485761076--