crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Micah Whitacre <mkw...@gmail.com>
Subject Re: Processing splittable inputs
Date Fri, 26 Feb 2016 03:20:26 GMT
Ben,
  Are the text files you are processing compressed?  If so that data
wouldn't be splittable.[1]

[1] -
http://www.grepcode.com/file/repo1.maven.org/maven2/org.apache.hadoop/hadoop-mapreduce-client-core/2.6.0/org/apache/hadoop/mapreduce/lib/input/TextInputFormat.java#57

On Thu, Feb 25, 2016 at 7:15 PM, Ben Juhn <benjijuhn@gmail.com> wrote:

> Hello there,
>
> I haven’t been able to get crunch to split inputs into multiple mappers.
> Currently it’s giving me one mapper per text file, even though they’re 1GB
> each.  I’ve tried supplying split.maxsize on the command line and in the
> DoFn implementation:
>
> @Override
> public void configure(Configuration conf) {
> conf.set("crunch.combine.file.size", "67108864");
> conf.set("mapreduce.input.fileinputformat.split.maxsize", "67108864");
> conf.set("mapreduce.input.fileinputformat.split.minsize", "67108864");
> }
>
> Any suggestions?
>
> Thanks,
> Ben
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message