hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Harsh J <ha...@cloudera.com>
Subject Re: [Cosmos-dev] Out of memory in identity mapper?
Date Thu, 06 Sep 2012 16:12:19 GMT
Protobuf involvement makes me more suspicious that this is possibly a
corruption or an issue with serialization as well. Perhaps if you can
share some stack traces, people can help better. If it is reliably
reproducible, then I'd also check for the count of records until after
this occurs, and see if the stacktraces are always same.

Serialization formats such as protobufs allocate objects based on read
sizes (like for example, a string size may be read first before the
string's bytes are read, and upon size read, such a length is
pre-allocated for the bytes to be read into), and in cases of corrupt
data or bugs in the deserialization code, it is quite easy for it to
make a large alloc request due to a badly read value. Its one
possibility.

Is the input compressed too, btw? Can you seek out the input file the
specific map fails on, and try to read it in an isolated manner to
validate it? Or do all maps seem to fail?

On Thu, Sep 6, 2012 at 9:01 PM, SEBASTIAN ORTEGA TORRES <sortega@tid.es> wrote:
> Input files are small fixed-size protobuf records and yes, it is
> reproducible (but it takes some time).
> In this case I cannot use combiners since I need to process all the elements
> with the same key altogether.
>
> Thanks for the prompt response
>
> --
> Sebastián Ortega Torres
> Product Development & Innovation / Telefónica Digital
> C/ Don Ramón de la Cruz 82-84
> Madrid 28006
>
>
>
>
>
>
> El 06/09/2012, a las 17:13, Harsh J escribió:
>
> I can imagine a huge record size possibly causing this. Is this
> reliably reproducible? Do you also have combiners enabled, which may
> run the reducer-logic on the map-side itself?
>
> On Thu, Sep 6, 2012 at 8:20 PM, JOAQUIN GUANTER GONZALBEZ <ximo@tid.es>
> wrote:
>
> Hello hadoopers!
>
>
>
>
> In a reduce-only Hadoop job input files are handled by the identity mapper
>
> and sent to the reducers without modification. In one of my job I was
>
> surprised to see the job failing in the map phase with "Out of memory error"
>
> and "GC overhead limit exceeded".
>
>
>
>
> In my understanding, a memory leak on the identity mapper is out of the
>
> question.
>
>
> What can be the cause of such error?
>
>
>
>
> Thanks,
>
>
> Ximo.
>
>
>
>
> P.S. The logs show no stack trace other than the messages I mentioned
>
> before.
>
>
>
> ________________________________
>
> Este mensaje se dirige exclusivamente a su destinatario. Puede consultar
>
> nuestra política de envío y recepción de correo electrónico en el enlace
>
> situado más abajo.
>
> This message is intended exclusively for its addressee. We only send and
>
> receive email on the basis of the terms set out at:
>
> http://www.tid.es/ES/PAGINAS/disclaimer.aspx
>
>
>
>
> --
> Harsh J
>
> _______________________________________________
> Cosmos-dev mailing list
> Cosmos-dev@tid.es
> https://listas.tid.es/mailman/listinfo/cosmos-dev
>
>
>
> ________________________________
>
> Este mensaje se dirige exclusivamente a su destinatario. Puede consultar
> nuestra política de envío y recepción de correo electrónico en el enlace
> situado más abajo.
> This message is intended exclusively for its addressee. We only send and
> receive email on the basis of the terms set out at:
> http://www.tid.es/ES/PAGINAS/disclaimer.aspx



-- 
Harsh J

Mime
View raw message