hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sonal Goyal <sonalgoy...@gmail.com>
Subject Re: DBOutputFormat Speed Issues
Date Mon, 01 Feb 2010 16:03:01 GMT
Hi Nick,

If you dont mind, can you please share your performance benchmarks of using
DataDrivenInputFormat/DBInputFormat and MySQL?

Thanks and Regards,
Sonal


On Mon, Feb 1, 2010 at 3:33 AM, Aaron Kimball <aaron@cloudera.com> wrote:

> Nick,
>
> I'm afraid that right now the only available OutputFormat for JDBC is that
> one. You'll note that DBOutputFormat doesn't really include much support
> for
> special-casing to MySQL or other targets.
>
> Your best bet is to probably copy the code from DBOutputFormat and
> DBConfiguration into some other class (e.g. MySQLDBOutputFormat) and modify
> the code in the RecordWriter to generate PreparedStatements containing
> batched insert statements.
>
> If you arrive at a solution which is pretty general-purpose/robust, please
> consider contributing it back to the Hadoop project :) If you do so, send
> me
> an email off-list; I'm happy to help with advice on developing better DB
> integration code, reviewing your work, etc.
>
> Also on the input side, you should really be using DataDrivenDBInputFormat
> instead of the older DBIF :) Sqoop (in src/contrib/sqoop on Apache 0.21 /
> CDH 0.20) has pretty good support for parallel imports, and uses this
> InputFormat instead.
>
> - Aaron
>
> On Thu, Jan 28, 2010 at 11:39 AM, Nick Jones <nick.jones@amd.com> wrote:
>
> > Hi all,
> > I have a use case for collecting several rows from MySQL of
> > compressed/unstructured data (n rows), expanding the data set, and
> storing
> > the expanded results back into a MySQL DB (100,000n rows). DBInputFormat
> > seems to perform reasonably well but DBOutputFormat is inserting rows
> > one-by-one.  How can I take advantage of MySQL's support of generating
> fewer
> > insert statements with more values within each one?
> >
> > Thanks.
> > --
> > Nick Jones
> >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message