hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dean Wampler <>
Subject Re: Real-life experience of forcing smaller input splits?
Date Fri, 25 Jan 2013 13:39:08 GMT

> That will be my approach for now, or disabling compression altogether for
> these files. The only problem I have is that compression is so efficient
> that any operation in the mapper (so on the uncompressed data) just makes
> the mapper throw an OOM exception, no matter how much memory I give it.
> What partly works though, is setting a low mapred.max.split.size. In a
> directory containing 34 files, I get 33 mappers (???). When setting
> hive.merge.mapfiles to false (and leaving mapred.max.split.size at its fs
> blocksize default), it doesn't seem to have any effect and I get 20 mappers
> only.
You can still use compression if you use a splittable format, like bzip2
with block compression. Gzip isn't splittable.

If you're running out of memory, you could also increase the heap size for
the client VMs. See the "Real-World Cluster Configurations" section of this

By the way, you could also experiment with turning on intermediate
compression; compression of the data sent between the mapper and reducer
tasks, compression of the output, etc, as discussed here:

>  ...

View raw message