hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From rajgopalv <raja.f...@gmail.com>
Subject Re: Non DFS space usage blows up.
Date Wed, 22 Dec 2010 18:34:18 GMT
Jean-Daniel Cryans <jdcryans@...> writes:
> 
> Look on your disks, using du -hs, to see what eats all that space.
> 
> J-D
> 
> On Tue, Dec 21, 2010 at 11:12 PM, rajgopalv <raja.fire@...> wrote:
> >
> > I'm doing a map reduce job to create the HFileOutputFormat out of CSVs.
> >
> > * The mapreduce job, operates on 75files, each containing 1Million rows.
> > Total comes up to 16GB. [with replication factor as 2, the total DFS used is
> > 32GB ]
> > * There are 300 Map jobs.
> > * The map job ends perfectly.
> > * There are 3 slave nodes (having 145GB hard disk), so
> > job.setNumReduceTasks(3) are 3 reducers,
> > * When the reduce job is about to end, the space on all the slave nodes run
> > out.
> >
> > I am confused. Why my space runs out during the reduce time (in the shuffle
> > phase) ?
> > --
> > View this message in context: http://old.nabble.com/Non-DFS-space-usage-
blows-up.-tp30511999p30511999.html
> > Sent from the HBase User mailing list archive at Nabble.com.
> >
> >
> 
> 

Dear Jean, 

The "mapred" directory eats the space.It occupies around 75GB in every machine. 
Why is that.? 

As per my understanding, Every map job takes a block (which is local to the 
machine), and spills it in the hard disk.  The reducer, then shuffles the map's 
output and bring everything to local and reduce it. So worst case, each reducer 
will have 16GB (because the seed data is 16GB). 

I have no idea why the disk is getting full.. 

I'm tying a variant of this code, https://issues.apache.org/jira/browse/HBASE-
2378

Thanks and regrads, 
Rajgopal V


Mime
View raw message