hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vinay Bagare <vbag...@me.com>
Subject Re: from relational to bigger data
Date Thu, 19 Dec 2013 21:59:25 GMT
I would also look at current setup.
I agree with Chris that 500 GB is fairly insignificant. 


Best,
Vinay Bagare



On Dec 19, 2013, at 12:51 PM, Chris Embree <cembree@gmail.com> wrote:

> In big data terms, 500G isn't big.  But, moving that much data around
> every night is not trivial either.  I'm going to guess at a lot here,
> but at a very high level.
> 
> 1. Sqoop the data required to build the summary tables into Hadoop.
> 2. Crunch the summaries into new tables (really just files on Hadoop)
> 3. Sqoop the summarized data back out to Oracle
> 4. Build Indices as needed.
> 
> Depending on the size of the data being sqoop'd, this might help.  It
> might also take longer.  A real solution would require more details
> and analysis.
> 
> Chris
> 
> On 12/19/13, Jay Vee <jvsrvcs@gmail.com> wrote:
>> We have a large relational database ( ~ 500 GB, hundreds of tables ).
>> 
>> We have summary tables that we rebuild from scratch each night that takes
>> about 10 hours.
>> From these summary tables, we have a web interface that accesses the
>> summary tables to build reports.
>> 
>> There is a business reason for doing a complete rebuild of the summary
>> tables each night, and using
>> views (as in the sense of Oracle views) is not an option at this time.
>> 
>> If I wanted to leverage Big Data technologies to speed up the summary table
>> rebuild, what would be the first step into getting all data into some big
>> data storage technology?
>> 
>> Ideally in the end, we want to retain the summary tables in a relational
>> database and have reporting work the same without modifications.
>> 
>> It's just the crunching of the data and building these relational summary
>> tables where we need a significant performance increase.
>> 


Mime
View raw message