hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "arvind@cloudera.com" <arv...@cloudera.com>
Subject Re: How to split DBInputFormat?
Date Mon, 03 Jan 2011 18:32:06 GMT

The DataDrivenInputFormat is a better fit for moving large volumes of data
as it generates WHERE clauses that help partition the data better.

You could also use Sqoop <https://github.com/cloudera/sqoop> that makes such
large volume data migration between relational sources and HDFS a breeze.


On Mon, Jan 3, 2011 at 8:56 AM, Joan <joan.monplet@gmail.com> wrote:

> Hi,
> I'm trying load data from big table in Database. I'm using DBInputFormat
> but when my Job try to get all records, It throws an execption:
> *Exception in thread "Thread for syncLogs" java.lang.OutOfMemoryError:
> Java heap space*
> I'm trying to get millions of records and I would like using DBInputSplit
> but I don't know how I used it and how many split I need?
> Thanks
> Joan

View raw message