hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Brian McSweeney <brian.mcswee...@gmail.com>
Subject Re: Import data from mysql
Date Sun, 09 Jan 2011 13:30:03 GMT
Hi Michael,

yeah, sorry, I shouldn't have said a compare as that would be a simplified
problem. For each two rows I have to calculate a score based on multiplying
some of the column values together, running some functions against each
other etc. I could do this as the rows are entered into the db, cutting down
the problem, however unforunately the values in the existing rows change
every day, therefore I think the only thing to do is export the lot and run
a job once a day to come up with the new scores. This is why I'm looking at
hadoop as it has become too big a job doing it in a serial processing way.

cheers,
Brian

On Sun, Jan 9, 2011 at 12:20 PM, Black, Michael (IS) <Michael.Black2@ngc.com
> wrote:

> What kind of compare do you have to do?
>
> You should be able to compute a checksum or such for each row when you
> insert them and only have to look at the subset that matches if you're doing
> some sort of substring or such.
>
> Michael D. Black
> Senior Scientist
> Advanced Analytics Directorate
> Northrop Grumman Information Systems
>
>
> ________________________________
>
> From: Brian McSweeney [mailto:brian.mcsweeney@gmail.com]
> Sent: Sat 1/8/2011 5:33 PM
> To: core-user@hadoop.apache.org
> Subject: EXTERNAL:Import data from mysql
>
>
>
> Hi folks,
>
> I'm a TOTAL newbie on hadoop. I have an existing webapp that has a growing
> number of rows in a mysql database that I have to compare against one
> another once a day from a batch job. This is an exponential problem as
> every
> row must be compared against every other row. I was thinking of
> parallelizing this computation via hadoop. As such, I was thinking that
> perhaps the first thing to look at is how to bring info from a database to
> a
> hadoop job and vise versa. I have seen the following relevant info
>
> https://issues.apache.org/jira/browse/HADOOP-2536
>
> and also
>
> http://architects.dzone.com/articles/tools-moving-sql-database
>
> any advice on what approach to use?
>
> cheers,
> Brian
>
>
>


-- 
-----------------------------------------
Brian McSweeney

Technology Director
Smarter Technology
web: http://www.smarter.ie
phone: +353868578212
-----------------------------------------

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message