hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sunil Govind <sunil.gov...@gmail.com>
Subject Re: All nodes are not used
Date Tue, 09 Aug 2016 15:27:23 GMT
HI Madhav

Could you help to share some more information here. When u say few nodes
are not utilized, is it always same nodes which are not utilized?

also how long each of these container are running on an average, pls make
sure you have provided enough split size to ensure the containers are not
short running.

Thanks
Sunil

On Tue, Aug 9, 2016 at 4:49 AM Madhav Sharan <msharan@usc.edu> wrote:

> Hi Hadoop users,
>
> I am running a m/r job with an input file of 23 million records. I can see
> all our files are not getting used.
>
> What can I change to utilize all nodes?
>
>
> Containers Mem Used Mem Avail Vcores used Vcores avail
> 8 11.25 GB 0 B 8 0
> 0 0 B 11.25 GB 0 8
> 0 0 B 11.25 GB 0 8
> 8 11.25 GB 0 B 8 0
> 8 11.25 GB 0 B 8 0
> 7 11.25 GB 0 B 7 1
> 5 7.03 GB 4.22 GB 5 3
> 0 0 B 11.25 GB 0 8
> 0 0 B 11.25 GB 0 8
>
>
> My command looks like -
>
> hadoop jar
> target/pooled-time-series-1.0-SNAPSHOT-jar-with-dependencies.jar
> gov.nasa.jpl.memex.pooledtimeseries.MeanChiSquareDistanceCalculation /user/pts/output/MeanChiSquareAndSimilarityInput
> /user/pts/output/MeanChiSquaredCalcOutput
>
> Directory - */user/pts/output/MeanChiSquareAndSimilarityInput* have a
> input file of 23 m records. File size is ~3 GB
>
> Code -
> https://github.com/smadha/pooled_time_series/blob/master/src/main/java/gov/nasa/jpl/memex/pooledtimeseries/MeanChiSquareDistanceCalculation.java#L135
>
>
> --
> Madhav Sharan
>
>

Mime
View raw message