crunch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tomáš Čechal <>
Subject NumberFormatException when parsing dfs.block.size
Date Tue, 03 Nov 2015 09:14:00 GMT

today I tried to run a new crunch job on our production cluster and it
crashed with a NumberFormatException in file CrunchCombineFileInputFormat
line 38 where the configuration is queried for a key named "dfs.block.size".
The crash I experienced was caused by the fact that I used a non-long value
"128m" as the default block size. I looked at the official hadoop
I discovered that such abbreviations should be supported.

Furthermore, the preferred name of the configuration key changed from
"dfs.block.size" to  "dfs.blocksize" (see The default value in the
source code hides the fact that the configuration key is not found in the
conf in more recent hadoop versions.

I am using crunch 0.11 that comes with CDH 5.4.5. The problematic line
first appeared in crunch 0.8 as a fix to issue CRUNCH-253.

The issue can be fixed by taking both config key names into account and
writing some logic to support block size abbreviations. Or, as a
workaround, file size abbreviations can be avoided on the client.

Do you think that this should be fixed or is the workaround enough? What
should I do if I want to submit a patch?

Tomas Cechal

View raw message