hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Harsh J <ha...@cloudera.com>
Subject Re: remote job submission
Date Sat, 21 Apr 2012 15:22:04 GMT
By "previous files" I meant the job related files there. DataNodes are
persistent members in HDFS. A removal of a DN results in loss of
blocks. Usually you have replication handling failures of DN
flawlessly, but consider a 1-replication cluster. A DN downtime can't
be acceptable in that case.

Writes to HDFS is done by writing blocks directly to DN, so a
JobClient does need access to it to write its job-related files to
HDFS.

On Sat, Apr 21, 2012 at 8:33 PM, JAX <jayunit100@gmail.com> wrote:
> Thanks j harsh:
> I have another question , though ---
>
> You mentioned that :
>
> The client needs access to
> " the
> DataNodes (for actually writing the previous files to DFS for the
> JobTracker to pick up)"
>
> What do you mean by previous files? It seems like, if designing Hadoop from scratch ,
I wouldn't want to force the client to communicate with data nodes at all, since those can
be added and removed during a job.
>
> Jay Vyas
> MMSB
> UCHC
>
> On Apr 21, 2012, at 1:14 AM, Harsh J <harsh@cloudera.com> wrote:
>
>> the
>> DataNodes (for actually writing the previous files to DFS for the
>> JobTracker to pick up)



-- 
Harsh J

Mime
View raw message