hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kayla Jay <kaylai...@yahoo.com>
Subject guaranteeing disk space?
Date Mon, 15 Sep 2008 18:24:48 GMT
How does one do a check or guarantee there's enough disk space when running a hadoop job  that
you're not sure how much it will produce in its results (temp files, etc) ?

I.e when you run a hadoop job and you're not exactly sure how much disk space it will eat
up (given temp dirs), the job will fail if it does run out.

How do you guarantee while you're job is running that there's enough disk space on the nodes
and kick off cleanup (so the job won't fail) if you're running into low disk space?

For example, if your maps are failing since there isn't enough  temporary disk space on your
nodes while you run a job, how can you fix that up front prior to running or better yet while
the job is running from causing a failed job? The outputs of maps are stored on the local-disk
of the nodes  
on which they were executed, and if your nodes don't have  enough while running jobs, how
can you fix this at run time?  Can I catch this condition at all?

Is there a way to fix this at run time?  How do others solve this issue when running jobs
that you're not sure how much disk space it will consume?

Or, what if you run out of disk space on the HDFS if you are running large jobs with large
outputs ?  The job just fails .. but how can one assess this  resource allocation of disk
space while running your jobs?

If you run out of HDFS disk space, and you know you want the results of job X, is there a
way to find out while running that you can do some smart cleanup as to not lose what data
could've been produced by job X?

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message