hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Brian McSweeney <brian.mcswee...@gmail.com>
Subject Re: Import data from mysql
Date Sun, 09 Jan 2011 13:21:40 GMT
Thanks Konstantin,

I had seen sqoop. I wonder is it normally used as a once off process or can
it also be effectively used on a live database system on a daily basis to
batch export. Are there performance issues with this approach? Or how would
it compare to some of the other classes that I have seen such as those in
the database library http://hadoop.apache.org/mapreduce/docs/current/api/

I have also seen a few alternatives out there such as cascading and
cascading-dbmigrate

http://architects.dzone.com/articles/tools-moving-sql-database

But from the hadoop api above it also seems that some of this functionality
is perhaps now in the main api. I suppose any experience people have is
welcome. I would want to run a batch job to export every day, perform my map
reduce, and then import the results back into mysql afterwards.

cheers,
Brian

On Sun, Jan 9, 2011 at 3:18 AM, Konstantin Boudnik <cos@apache.org> wrote:

> There's a supported tool with all bells and whistles:
>  http://www.cloudera.com/downloads/sqoop/
>
> --
>   Take care,
> Konstantin (Cos) Boudnik
>
> On Sat, Jan 8, 2011 at 18:57, Sonal Goyal <sonalgoyal4@gmail.com> wrote:
> > Hi Brian,
> >
> > You can check HIHO at https://github.com/sonalgoyal/hiho which can help
> you
> > load data from any JDBC database to the Hadoop file system. If your table
> > has a date or id field, or any indicator for modified/newly added rows,
> you
> > can import only the altered rows every day. Please let me know if you
> need
> > help.
> >
> > Thanks and Regards,
> > Sonal
> > <https://github.com/sonalgoyal/hiho>Connect Hadoop with databases,
> > Salesforce, FTP servers and others <https://github.com/sonalgoyal/hiho>
> > Nube Technologies <http://www.nubetech.co>
> >
> > <http://in.linkedin.com/in/sonalgoyal>
> >
> >
> >
> >
> >
> > On Sun, Jan 9, 2011 at 5:03 AM, Brian McSweeney
> > <brian.mcsweeney@gmail.com>wrote:
> >
> >> Hi folks,
> >>
> >> I'm a TOTAL newbie on hadoop. I have an existing webapp that has a
> growing
> >> number of rows in a mysql database that I have to compare against one
> >> another once a day from a batch job. This is an exponential problem as
> >> every
> >> row must be compared against every other row. I was thinking of
> >> parallelizing this computation via hadoop. As such, I was thinking that
> >> perhaps the first thing to look at is how to bring info from a database
> to
> >> a
> >> hadoop job and vise versa. I have seen the following relevant info
> >>
> >> https://issues.apache.org/jira/browse/HADOOP-2536
> >>
> >> and also
> >>
> >> http://architects.dzone.com/articles/tools-moving-sql-database
> >>
> >> any advice on what approach to use?
> >>
> >> cheers,
> >> Brian
> >>
> >
>



-- 
-----------------------------------------
Brian McSweeney

Technology Director
Smarter Technology
web: http://www.smarter.ie
phone: +353868578212
-----------------------------------------

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message