hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joan <joan.monp...@gmail.com>
Subject Re: How to split DBInputFormat?
Date Tue, 04 Jan 2011 10:03:16 GMT
Thanks,

I've incremented number map tasks and number of reduce tasks, Although
worksI think that it's not a solution so I will try both proposals

Joan

2011/1/4 Hari Sreekumar <hsreekumar@clickable.com>

> Arvind,
>
> Where can I find DataDrivenInputFormat? Is it available in v0.20.2 and is
> it stable?
>
> Thanks,
> Hari
>
>
> On Tue, Jan 4, 2011 at 12:02 AM, arvind@cloudera.com <arvind@cloudera.com>wrote:
>
>> Joan,
>>
>> The DataDrivenInputFormat is a better fit for moving large volumes of data
>> as it generates WHERE clauses that help partition the data better.
>>
>> You could also use Sqoop <https://github.com/cloudera/sqoop> that makes
>> such large volume data migration between relational sources and HDFS a
>> breeze.
>>
>> Arvind
>>
>>
>> On Mon, Jan 3, 2011 at 8:56 AM, Joan <joan.monplet@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> I'm trying load data from big table in Database. I'm using DBInputFormat
>>> but when my Job try to get all records, It throws an execption:
>>>
>>> *Exception in thread "Thread for syncLogs" java.lang.OutOfMemoryError:
>>> Java heap space*
>>>
>>> I'm trying to get millions of records and I would like using DBInputSplit
>>> but I don't know how I used it and how many split I need?
>>>
>>> Thanks
>>>
>>> Joan
>>>
>>
>>
>

Mime
View raw message