hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sonal Goyal <sonalgoy...@gmail.com>
Subject Re: How to split DBInputFormat?
Date Tue, 04 Jan 2011 10:27:23 GMT
Hi Hari,

I dont think DataDrivenDBInputFormat is available in 0.20.x, its only
available in 0.21 versions. You can check hihoApache0.20 branch at
https://github.com/sonalgoyal/hiho/ which backports the relevent db formats
for Apache Hadoop 0.20 versions.

Thanks and Regards,
Sonal
<https://github.com/sonalgoyal/hiho>Connect Hadoop with databases,
Salesforce, FTP servers and others <https://github.com/sonalgoyal/hiho>
Nube Technologies <http://www.nubetech.co>

<http://in.linkedin.com/in/sonalgoyal>





On Tue, Jan 4, 2011 at 10:36 AM, Hari Sreekumar <hsreekumar@clickable.com>wrote:

> Arvind,
>
> Where can I find DataDrivenInputFormat? Is it available in v0.20.2 and is
> it stable?
>
> Thanks,
> Hari
>
>
> On Tue, Jan 4, 2011 at 12:02 AM, arvind@cloudera.com <arvind@cloudera.com>wrote:
>
>> Joan,
>>
>> The DataDrivenInputFormat is a better fit for moving large volumes of data
>> as it generates WHERE clauses that help partition the data better.
>>
>> You could also use Sqoop <https://github.com/cloudera/sqoop> that makes
>> such large volume data migration between relational sources and HDFS a
>> breeze.
>>
>> Arvind
>>
>>
>> On Mon, Jan 3, 2011 at 8:56 AM, Joan <joan.monplet@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> I'm trying load data from big table in Database. I'm using DBInputFormat
>>> but when my Job try to get all records, It throws an execption:
>>>
>>> *Exception in thread "Thread for syncLogs" java.lang.OutOfMemoryError:
>>> Java heap space*
>>>
>>> I'm trying to get millions of records and I would like using DBInputSplit
>>> but I don't know how I used it and how many split I need?
>>>
>>> Thanks
>>>
>>> Joan
>>>
>>
>>
>

Mime
View raw message