hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bejoy KS" <bejoy.had...@gmail.com>
Subject Re: Some general questions about DBInputFormat
Date Tue, 11 Sep 2012 12:48:28 GMT
Hi Yaron

Sqoop uses a similar implementation. You can get some details there.

Replies inline
• (more general question) Are there many use-cases for using DBInputFormat? Do most Hadoop
jobs take their input from files or DBs?

> From my small experience Most MR jobs have data in hdfs. It is useful for getting data
out of rdbms to hadoop, sqoop implemenation is an example.


• Since all mappers open a connection to the same DBS, one cannot use hundreds of mapper.
Is there a solution to this problem? 

>Num of mappers shouldn't be more than the permissible number of connections allowed for
that db. 



Regards
Bejoy KS

Sent from handheld, please excuse typos.

-----Original Message-----
From: Yaron Gonen <yaron.gonen@gmail.com>
Date: Tue, 11 Sep 2012 15:41:26 
To: <user@hadoop.apache.org>
Reply-To: user@hadoop.apache.org
Subject: Some general questions about DBInputFormat

Hi,
After reviewing the class's (not very complicated) code, I have some
questions I hope someone can answer:

   - (more general question) Are there many use-cases for using
   DBInputFormat? Do most Hadoop jobs take their input from files or DBs?
   - What happens when the database is updated during mappers' data
   retrieval phase? is there a way to lock the database before the data
   retrieval phase and release it afterwords?
   - Since all mappers open a connection to the same DBS, one cannot use
   hundreds of mapper. Is there a solution to this problem?

Thanks,
Yaron

Mime
View raw message