hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Blanca Hernandez <Blanca.Hernan...@willhaben.at>
Subject Extremely amount of memory and DB connections by MR Job
Date Mon, 29 Sep 2014 12:57:41 GMT
Hi,

I am using a hadoop map reduce job + mongoDb.
It goes against a data base 252Gb big. During the job the amount of conexions is over 8000
and we gave already 9Gb RAM. The job is still crashing because of a OutOfMemory with only
a 8% of the mapping done.
Are this numbers normal? Or did we miss something regarding configuration?
I attach my code, just in case the problem is with it.

Mapper:

public class AveragePriceMapper extends Mapper<Object, BasicDBObject, Text, BSONWritable>
{
    @Override
    public void map(final Object key, final BasicDBObject val, final Context context) throws
IOException, InterruptedException {
        String id = "";
        for(String propertyId : currentId.split(AveragePriceGlobal.SEPARATOR)){
            id += val.get(propertyId) + AveragePriceGlobal.SEPARATOR;
        }
        BSONWritable bsonWritable = new BSONWritable(val);
        context.write(new Text(id), bsonWritable);
    }
}


Reducer:
public class AveragePriceReducer extends Reducer<Text, BSONWritable, Text, Text>  {
    public void reduce(final Text pKey, final Iterable<BSONWritable> pValues, final
Context pContext) throws IOException, InterruptedException {
        while(pValues.iterator().hasNext() && continueLoop){
            BSONWritable next = pValues.iterator().next();
            //Make some calculations
        }        pContext.write(new Text(currentId), new Text(new MyClass(currentId, AveragePriceGlobal.COMMENT,
0, 0).toString()));

    }
}

The configuration includes a query which filters the number of objects to analyze (not the
252Gb will be analyzed).

Many thanks. Best regards,
Blanca

Mime
View raw message