Return-Path: X-Original-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id E89AC1077C for ; Thu, 19 Dec 2013 22:00:10 +0000 (UTC) Received: (qmail 57724 invoked by uid 500); 19 Dec 2013 22:00:05 -0000 Delivered-To: apmail-hadoop-hdfs-user-archive@hadoop.apache.org Received: (qmail 57574 invoked by uid 500); 19 Dec 2013 22:00:05 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 57567 invoked by uid 99); 19 Dec 2013 22:00:05 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 19 Dec 2013 22:00:05 +0000 X-ASF-Spam-Status: No, hits=-2.3 required=5.0 tests=RCVD_IN_DNSWL_MED,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of vbagare@me.com designates 17.172.204.239 as permitted sender) Received: from [17.172.204.239] (HELO st11p01mm-asmtp001.mac.com) (17.172.204.239) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 19 Dec 2013 21:59:58 +0000 Received: from [17.149.239.191] (unknown [17.149.239.191]) by st11p01mm-asmtp001.mac.com (Oracle Communications Messaging Server 7u4-27.08(7.0.4.27.7) 64bit (built Aug 22 2013)) with ESMTPSA id <0MY2003NIQF21EA0@st11p01mm-asmtp001.mac.com> for user@hadoop.apache.org; Thu, 19 Dec 2013 21:59:27 +0000 (GMT) X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:5.11.87,1.0.14,0.0.0000 definitions=2013-12-19_07:2013-12-19,2013-12-19,1970-01-01 signatures=0 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 spamscore=0 suspectscore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=7.0.1-1308280000 definitions=main-1312190157 Content-type: text/plain; charset=iso-8859-1 MIME-version: 1.0 (Mac OS X Mail 7.0 \(1822\)) Subject: Re: from relational to bigger data From: Vinay Bagare In-reply-to: Date: Thu, 19 Dec 2013 13:59:25 -0800 Content-transfer-encoding: quoted-printable Message-id: References: To: user@hadoop.apache.org, chris@embree.us X-Mailer: Apple Mail (2.1822) X-Virus-Checked: Checked by ClamAV on apache.org I would also look at current setup. I agree with Chris that 500 GB is fairly insignificant.=20 Best, Vinay Bagare On Dec 19, 2013, at 12:51 PM, Chris Embree wrote: > In big data terms, 500G isn't big. But, moving that much data around > every night is not trivial either. I'm going to guess at a lot here, > but at a very high level. >=20 > 1. Sqoop the data required to build the summary tables into Hadoop. > 2. Crunch the summaries into new tables (really just files on Hadoop) > 3. Sqoop the summarized data back out to Oracle > 4. Build Indices as needed. >=20 > Depending on the size of the data being sqoop'd, this might help. It > might also take longer. A real solution would require more details > and analysis. >=20 > Chris >=20 > On 12/19/13, Jay Vee wrote: >> We have a large relational database ( ~ 500 GB, hundreds of tables ). >>=20 >> We have summary tables that we rebuild from scratch each night that = takes >> about 10 hours. >> =46rom these summary tables, we have a web interface that accesses = the >> summary tables to build reports. >>=20 >> There is a business reason for doing a complete rebuild of the = summary >> tables each night, and using >> views (as in the sense of Oracle views) is not an option at this = time. >>=20 >> If I wanted to leverage Big Data technologies to speed up the summary = table >> rebuild, what would be the first step into getting all data into some = big >> data storage technology? >>=20 >> Ideally in the end, we want to retain the summary tables in a = relational >> database and have reporting work the same without modifications. >>=20 >> It's just the crunching of the data and building these relational = summary >> tables where we need a significant performance increase. >>=20