crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Micah Whitacre <mkw...@gmail.com>
Subject Re: Processing splittable inputs
Date Fri, 26 Feb 2016 23:37:51 GMT
Where are you trying to specify them?  Inside a DoFn?  Prior to
constructing the MRPipeline?

I'd suggest trying either:
1. Setting those values on the initial Configuration object you pass to the
MRPipeline
2. Setting them as Source specific properties[1] on the source itself.

The latter approach might be better if you are reading a lot of different
sources into your pipeline and don't want to affect them all.

[1] -
http://crunch.apache.org/apidocs/0.12.0/org/apache/crunch/Source.html#inputConf(java.lang.String,%20java.lang.String)

On Fri, Feb 26, 2016 at 5:17 PM, Ben Juhn <benjijuhn@gmail.com> wrote:

> The data isn’t compressed.  The parameters aren’t showing up in the job
> configuration either.
>
>
> > On Feb 25, 2016, at 5:15 PM, Ben Juhn <benjijuhn@gmail.com> wrote:
> >
> > Hello there,
> >
> > I haven’t been able to get crunch to split inputs into multiple
> mappers.  Currently it’s giving me one mapper per text file, even though
> they’re 1GB each.  I’ve tried supplying split.maxsize on the command line
> and in the DoFn implementation:
> >
> > @Override
> > public void configure(Configuration conf) {
> > conf.set("crunch.combine.file.size", "67108864");
> > conf.set("mapreduce.input.fileinputformat.split.maxsize", "67108864");
> > conf.set("mapreduce.input.fileinputformat.split.minsize", "67108864");
> > }
> >
> > Any suggestions?
> >
> > Thanks,
> > Ben
> >
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message