hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Han JU <ju.han.fe...@gmail.com>
Subject M/R job optimization
Date Fri, 26 Apr 2013 09:21:54 GMT

I've implemented an algorithm with Hadoop, it's a series of 4 jobs. My
questionis that in one of the jobs, map and reduce tasks show 100% finished
in about 1m 30s, but I have to wait another 5m for this job to finish.
This job writes about 720mb compressed data to HDFS with replication factor
1, in sequence file format. I've tried copying these data to hdfs, it takes
only < 20 seconds. What happened during this 5 more minutes?

Any idea on how to optimize this part?


*JU Han*

UTC   -  Université de Technologie de Compiègne
*     **GI06 - Fouille de Données et Décisionnel*

+33 0619608888

View raw message