hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nick Jones <nick.jo...@amd.com>
Subject Re: Some general questions about DBInputFormat
Date Tue, 11 Sep 2012 13:00:31 GMT
Hi Yaron

Replies inline below.

On 09/11/2012 07:41 AM, Yaron Gonen wrote:
> Hi,
> After reviewing the class's (not very complicated) code, I have some 
> questions I hope someone can answer:
>   * (more general question) Are there many use-cases for using
>     DBInputFormat? Do most Hadoop jobs take their input from files or DBs?
Bejoy's right, most jobs utilize data across HDFS or some other 
distributed architecture to feed M/R at a sufficient rate. DBInputFormat 
could be helpful in pulling pointers to other sources of data (e.g. file 
paths for filers where actual binary content is stored).
>   * What happens when the database is updated during mappers' data
>     retrieval phase? is there a way to lock the database before the
>     data retrieval phase and release it afterwords?
The whole job creates a transaction against the RBDMS that ensures 
consistent state throughout the job.  Depending on the source and 
settings, this might entirely lock a table or lock the selected rows by 
the query.
>   * Since all mappers open a connection to the same DBS, one cannot
>     use hundreds of mapper. Is there a solution to this problem?
Depends on the connection limits and the number of rows requested. I've 
found that the server suffered other problems first before connection 
count limitations.
> Thanks,
> Yaron

View raw message