hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Harsh J <ha...@cloudera.com>
Subject Re: Can we run job on some datanodes ?
Date Wed, 21 Sep 2011 13:09:15 GMT

TaskTrackers run your jobs' tasks for you, not DataNodes directly. So
you can statically control loads on nodes by removing away
TaskTrackers from your cluster.

i.e, if you "service hadoop-0.20-tasktracker stop" or
"hadoop-daemon.sh stop tasktracker" on the specific nodes, jobs won't
run there anymore.

Is this what you're looking for?

(There are ways to achieve the exclusion dynamically, by writing a
scheduler, but hard to tell without knowing what you need
specifically, and why do you require it?)

On Wed, Sep 21, 2011 at 6:32 PM, praveenesh kumar <praveenesh@gmail.com> wrote:
> Is there any way that we can run a particular job in a hadoop on subset of
> datanodes ?
> My problem is I don't want to use all the nodes to run some job,
> I am trying to make Job completion Vs No. of nodes graph for a particular
> job.
> One way to do is I can remove datanodes, and then see how much time the job
> is taking.
> Just for curiosity sake, want to know is there any other way possible to do
> this, without removing datanodes.
> I am afraid, if I remove datanodes, I can loose some data blocks that reside
> on those machines as I have some files with replication = 1 ?
> Thanks,
> Praveenesh

Harsh J

View raw message