crunch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From John Jensen <>
Subject Problem running job with large number of directories
Date Wed, 22 May 2013 23:01:53 GMT

I have a curious problem when running a crunch job on (avro) files in a fairly large set of
directories (just slightly less than 100).
After running some fraction of the mappers they start failing with the exception below. Things
work fine with a smaller number of directories.

The magic 'zdHJpbmcifSx7Im5hbWUiOiJ2YWx1ZSIsInR5cGUiOiJzdHJpbmcifV19fSwiZGVmYXVsdCI' string
shows up in the 'crunch.inputs.dir' entry in the job config, so I assume it has something
to do with deserializing that value, but reading through the code I don't see any obvious
way how.

Furthermore, the crunch.inputs.dir config entry is just under 1.5M, so it would not surprise
me if I'm running up against a hadoop limit somewhere.

Has anybody else seen similar issues? (this is 0.5.0, btw).

-- John Split class zdHJp
bmcifSx7Im5hbWUiOiJ2YWx1ZSIsInR5cGUiOiJzdHJpbmcifV19fSwiZGVmYXVsdCI not found
        at org.apache.hadoop.mapred.MapTask.getSplitDetails(
        at org.apache.hadoop.mapred.MapTask.runNewMapper(
        at org.apache.hadoop.mapred.Child$
        at Method)
        at org.apache.hadoop.mapred.Child.main(
Caused by: java.lang.ClassNotFoundException: Class zdHJp
bmcifSx7Im5hbWUiOiJ2YWx1ZSIsInR5cGUiOiJzdHJpbmcifV19fSwiZGVmYXVsdCI not found
        at org.apache.hadoop.conf.Configuration.getClassByName(
        at org.apache.hadoop.mapred.MapTask.getSplitDetails(
        ... 7 more

View raw message