hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vinod Kumar Vavilapalli <vino...@hortonworks.com>
Subject Re: No space left on device during merge.
Date Mon, 27 Jan 2014 18:46:32 GMT

Yes sampling is a great way to do. That's what is done with the Terasort example. See the
code for org.apache.hadoop.examples.terasort.TeraSort and specifically org.apache.hadoop.examples.terasort.TeraInputFormat.

The other simpler option to start with is some kind of brute force partitioning. Something
like lexicographical partitioning of URLs. It won't give you a great balance to begin with,
but it should get you started.

+Vinod

On Jan 27, 2014, at 2:03 AM, Tim Potter <tep@yahoo-inc.com> wrote:

> Thanks for your reply Vinod.    I've been thinking about partitioning the data to having
multiple reducers each one working on a contiguous part of the sort space.  The problems is
the keys are a combination of URLs and RDF BNodes. I can't see a way, without previously analysing
the data, of partitioning the URLs equally in the sort space. Although I'm complete open to
suggestions..   I guess I could analyse a sample of the data and build a partition function
that works well on that, then apply it to the full data set.
> 
> I was hoping  there was a way of tuning how Hadoop sorts.
> 
> Regards,
>   Tim.
> 
> On 1/24/14, 7:29 PM, Vinod Kumar Vavilapalli wrote:
>> That's a lot of data to process for a single reducer. You should try increasing the
number of reducers to achieve more parallelism and also try modifying your logic to avoid
significant skew in the reducers.
>> 
>> Unfortunately this means rethinking about your app, but that's the only way about
it. It will also help you scale smoothly into the future if you have adjustable parallelism
and more balanced data processing.
>> 
>> +Vinod
>> Hortonworks Inc.
>> http://hortonworks.com/
>> 
>> 
>> On Fri, Jan 24, 2014 at 6:47 AM, Tim Potter <tep@yahoo-inc.com> wrote:
>> Hi,
>>   I'm getting the below error while trying to sort a lot of data with Hadoop.
>> 
>> I strongly suspect the node the merge is on is running out of local disk space. Assuming
this is the case, is there any way
>> to get around this limitation considering I can't increase the local disk space available
on the nodes?  Like specify sort/merge parameters or similar.
>> 
>> Thanks,
>>   Tim.
>> 
>> 2014-01-24 10:02:36,267 INFO [main] org.apache.hadoop.io.compress.CodecPool: Got
brand-new decompressor [.lzo_deflate]
>> 2014-01-24 10:02:36,280 INFO [main] org.apache.hadoop.mapred.Merger: Down to the
last merge-pass, with 100 segments left of total size: 642610678884 bytes
>> 2014-01-24 10:02:36,281 ERROR [main] org.apache.hadoop.security.UserGroupInformation:
PriviledgedActionException as:XXXXXX (auth:XXXXXX) cause:org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError:
error in shuffle in OnDiskMerger - Thread to merge on-disk map-outputs
>> 2014-01-24 10:02:36,282 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception
running child : org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in shuffle
in OnDiskMerger - Thread to merge on-disk map-outputs
>> 	at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:167)
>> 	at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:371)
>> 	at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:158)
>> 	at java.security.AccessController.doPrivileged(Native Method)
>> 	at javax.security.auth.Subject.doAs(Subject.java:415)
>> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1284)
>> 	at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:153)
>> Caused by: org.apache.hadoop.fs.FSError: java.io.IOException: No space left on device
>> 	at org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.write(RawLocalFileSystem.java:213)
>> 	at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
>> 	at java.io.BufferedOutputStream.write(BufferedOutputStream.java:126)
>> 	at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:54)
>> 	at java.io.DataOutputStream.write(DataOutputStream.java:107)
>> 	at org.apache.hadoop.mapred.IFileOutputStream.write(IFileOutputStream.java:88)
>> 	at org.apache.hadoop.io.compress.BlockCompressorStream.compress(BlockCompressorStream.java:150)
>> 	at org.apache.hadoop.io.compress.BlockCompressorStream.finish(BlockCompressorStream.java:140)
>> 	at org.apache.hadoop.io.compress.BlockCompressorStream.write(BlockCompressorStream.java:99)
>> 	at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:54)
>> 	at java.io.DataOutputStream.write(DataOutputStream.java:107)
>> 	at org.apache.hadoop.mapred.IFile$Writer.append(IFile.java:249)
>> 	at org.apache.hadoop.mapred.Merger.writeFile(Merger.java:200)
>> 	at org.apache.hadoop.mapreduce.task.reduce.MergeManager$OnDiskMerger.merge(MergeManager.java:572)
>> 	at org.apache.hadoop.mapreduce.task.reduce.MergeThread.run(MergeThread.java:94)
>> Caused by: java.io.IOException: No space left on device
>> 	at java.io.FileOutputStream.writeBytes(Native Method)
>> 	at java.io.FileOutputStream.write(FileOutputStream.java:318)
>> 	at org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.write(RawLocalFileSystem.java:211)
>> 	... 14 more
>> 
>> 2014-01-24 10:02:36,284 INFO [main] org.apache.hadoop.mapred.Task: Runnning cleanup
for the task
>> 
>> 
>> CONFIDENTIALITY NOTICE
>> NOTICE: This message is intended for the use of the individual or entity to which
it is addressed and may contain information that is confidential, privileged and exempt from
disclosure under applicable law. If the reader of this message is not the intended recipient,
you are hereby notified that any printing, copying, dissemination, distribution, disclosure
or forwarding of this communication is strictly prohibited. If you have received this communication
in error, please contact the sender immediately and delete it from your system. Thank You.
> 


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Mime
View raw message