Return-Path: X-Original-To: apmail-hadoop-common-user-archive@www.apache.org Delivered-To: apmail-hadoop-common-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 23B0817CB2 for ; Mon, 29 Sep 2014 12:58:19 +0000 (UTC) Received: (qmail 15877 invoked by uid 500); 29 Sep 2014 12:58:13 -0000 Delivered-To: apmail-hadoop-common-user-archive@hadoop.apache.org Received: (qmail 15764 invoked by uid 500); 29 Sep 2014 12:58:13 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 15754 invoked by uid 99); 29 Sep 2014 12:58:12 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 29 Sep 2014 12:58:12 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of Blanca.Hernandez@willhaben.at designates 195.12.209.79 as permitted sender) Received: from [195.12.209.79] (HELO srvsgr-smtp02.styria-it.com) (195.12.209.79) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 29 Sep 2014 12:58:05 +0000 Received: from SRVSGR-HUBCAS03.AT.styria-it.net ([195.12.215.43]) by srvsgr-smtp02.styria-it.com (8.14.5/8.14.5) with ESMTP id s8TCvfGU031869 for ; Mon, 29 Sep 2014 14:57:41 +0200 Received: from SRVSGR-MBOX02.AT.styria-it.net ([fe80::b823:953a:d902:1ee0]) by SRVSGR-HUBCAS03.AT.styria-it.net ([fe80::fc4c:e673:f40:3198%14]) with mapi id 14.03.0174.001; Mon, 29 Sep 2014 14:57:41 +0200 From: Blanca Hernandez To: "user@hadoop.apache.org" Subject: Extremely amount of memory and DB connections by MR Job Thread-Topic: Extremely amount of memory and DB connections by MR Job Thread-Index: Ac/b4+oP4vt8ioMqQH2E2+8gT0+mcQ== Date: Mon, 29 Sep 2014 12:57:41 +0000 Message-ID: Accept-Language: en-US, de-AT Content-Language: de-DE X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [195.12.192.51] Content-Type: multipart/alternative; boundary="_000_CF5C9CB4C4722244884762FA2936D75F14FC7B8FSRVSGRMBOX02ATs_" MIME-Version: 1.0 X-Virus-Checked: Checked by ClamAV on apache.org --_000_CF5C9CB4C4722244884762FA2936D75F14FC7B8FSRVSGRMBOX02ATs_ Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable Hi, I am using a hadoop map reduce job + mongoDb. It goes against a data base 252Gb big. During the job the amount of conexio= ns is over 8000 and we gave already 9Gb RAM. The job is still crashing beca= use of a OutOfMemory with only a 8% of the mapping done. Are this numbers normal? Or did we miss something regarding configuration? I attach my code, just in case the problem is with it. Mapper: public class AveragePriceMapper extends Mapper { @Override public void map(final Object key, final BasicDBObject val, final Contex= t context) throws IOException, InterruptedException { String id =3D ""; for(String propertyId : currentId.split(AveragePriceGlobal.SEPARATO= R)){ id +=3D val.get(propertyId) + AveragePriceGlobal.SEPARATOR; } BSONWritable bsonWritable =3D new BSONWritable(val); context.write(new Text(id), bsonWritable); } } Reducer: public class AveragePriceReducer extends Reducer { public void reduce(final Text pKey, final Iterable pValue= s, final Context pContext) throws IOException, InterruptedException { while(pValues.iterator().hasNext() && continueLoop){ BSONWritable next =3D pValues.iterator().next(); //Make some calculations } pContext.write(new Text(currentId), new Text(new MyClass(c= urrentId, AveragePriceGlobal.COMMENT, 0, 0).toString())); } } The configuration includes a query which filters the number of objects to a= nalyze (not the 252Gb will be analyzed). Many thanks. Best regards, Blanca --_000_CF5C9CB4C4722244884762FA2936D75F14FC7B8FSRVSGRMBOX02ATs_ Content-Type: text/html; charset="us-ascii" Content-Transfer-Encoding: quoted-printable

Hi,

 

I am using a hadoop map reduce = job + mongoDb.

It goes against a data base 252= Gb big. During the job the amount of conexions is over 8000 and we gave alr= eady 9Gb RAM. The job is still crashing because of a OutOfMemory with only = a 8% of the mapping done.

Are this numbers normal? Or did= we miss something regarding configuration?

I attach my code, just in case = the problem is with it.

 

Mapper:

 

public class AveragePriceMapper= extends Mapper<Object, BasicDBObject, Text, BSONWritable> {

    @Override

    public void = map(final Object key, final BasicDBObject val, final Context context) throw= s IOException, InterruptedException {

     &= nbsp;  String id =3D "";

     &= nbsp;  for(String propertyId : currentId.split(AveragePriceGlobal.SEPA= RATOR)){

     &= nbsp;      id +=3D val.get(propertyId) + A= veragePriceGlobal.SEPARATOR;

     &= nbsp;  }

     &= nbsp;  BSONWritable bsonWritable =3D new BSONWritable(val);=

     &= nbsp;  context.write(new Text(id), bsonWritable);

    }=

}

 

 

Reducer:

public class AveragePriceReduce= r extends Reducer<Text, BSONWritable, Text, Text>  {<= /span>

    public void = reduce(final Text pKey, final Iterable<BSONWritable> pValues, final C= ontext pContext) throws IOException, InterruptedException {

     &= nbsp;  while(pValues.iterator().hasNext() && continueLoop){

     &= nbsp;      BSONWritable next =3D pValues.iterator(= ).next();

     &= nbsp;      //Make some calculations

     &= nbsp;  }        pContext.write(new = Text(currentId), new Text(new MyClass(currentId, AveragePriceGlobal.COMMENT= , 0, 0).toString()));

 

    }=

}

 

The configuration includes a qu= ery which filters the number of objects to analyze (not the 252Gb will be a= nalyzed).

 

Many thanks. Best regards,

Blanca

--_000_CF5C9CB4C4722244884762FA2936D75F14FC7B8FSRVSGRMBOX02ATs_--