hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Madhav Sharan <msha...@usc.edu>
Subject All nodes are not used
Date Mon, 08 Aug 2016 23:19:33 GMT
Hi Hadoop users,

I am running a m/r job with an input file of 23 million records. I can see
all our files are not getting used.

What can I change to utilize all nodes?


Containers Mem Used Mem Avail Vcores used Vcores avail
8 11.25 GB 0 B 8 0
0 0 B 11.25 GB 0 8
0 0 B 11.25 GB 0 8
8 11.25 GB 0 B 8 0
8 11.25 GB 0 B 8 0
7 11.25 GB 0 B 7 1
5 7.03 GB 4.22 GB 5 3
0 0 B 11.25 GB 0 8
0 0 B 11.25 GB 0 8


My command looks like -

hadoop jar target/pooled-time-series-1.0-SNAPSHOT-jar-with-dependencies.jar
gov.nasa.jpl.memex.pooledtimeseries.MeanChiSquareDistanceCalculation
/user/pts/output/MeanChiSquareAndSimilarityInput
/user/pts/output/MeanChiSquaredCalcOutput

Directory - */user/pts/output/MeanChiSquareAndSimilarityInput* have a input
file of 23 m records. File size is ~3 GB

Code -
https://github.com/smadha/pooled_time_series/blob/master/src/main/java/gov/nasa/jpl/memex/pooledtimeseries/MeanChiSquareDistanceCalculation.java#L135


--
Madhav Sharan

Mime
View raw message