hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jones, Nick" <nick.jo...@amd.com>
Subject RE: DBOutputFormat Speed Issues
Date Mon, 01 Feb 2010 18:03:19 GMT
Sonal,
I'll try and put together something soon.  I haven't located documentation
on DataDrivenInputFormat yet to implement it. :)

Nick Jones

-----Original Message-----
From: Sonal Goyal [mailto:sonalgoyal4@gmail.com] 
Sent: Monday, February 01, 2010 10:03 AM
To: common-user@hadoop.apache.org
Subject: Re: DBOutputFormat Speed Issues

Hi Nick,

If you dont mind, can you please share your performance benchmarks of using
DataDrivenInputFormat/DBInputFormat and MySQL?

Thanks and Regards,
Sonal


On Mon, Feb 1, 2010 at 3:33 AM, Aaron Kimball <aaron@cloudera.com> wrote:

> Nick,
>
> I'm afraid that right now the only available OutputFormat for JDBC is that
> one. You'll note that DBOutputFormat doesn't really include much support
> for
> special-casing to MySQL or other targets.
>
> Your best bet is to probably copy the code from DBOutputFormat and
> DBConfiguration into some other class (e.g. MySQLDBOutputFormat) and
modify
> the code in the RecordWriter to generate PreparedStatements containing
> batched insert statements.
>
> If you arrive at a solution which is pretty general-purpose/robust, please
> consider contributing it back to the Hadoop project :) If you do so, send
> me
> an email off-list; I'm happy to help with advice on developing better DB
> integration code, reviewing your work, etc.
>
> Also on the input side, you should really be using DataDrivenDBInputFormat
> instead of the older DBIF :) Sqoop (in src/contrib/sqoop on Apache 0.21 /
> CDH 0.20) has pretty good support for parallel imports, and uses this
> InputFormat instead.
>
> - Aaron
>
> On Thu, Jan 28, 2010 at 11:39 AM, Nick Jones <nick.jones@amd.com> wrote:
>
> > Hi all,
> > I have a use case for collecting several rows from MySQL of
> > compressed/unstructured data (n rows), expanding the data set, and
> storing
> > the expanded results back into a MySQL DB (100,000n rows). DBInputFormat
> > seems to perform reasonably well but DBOutputFormat is inserting rows
> > one-by-one.  How can I take advantage of MySQL's support of generating
> fewer
> > insert statements with more values within each one?
> >
> > Thanks.
> > --
> > Nick Jones
> >
> >
>

Mime
View raw message