crunch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tomáš Čechal <tomas.cec...@gmail.com>
Subject NumberFormatException when parsing dfs.block.size
Date Tue, 03 Nov 2015 09:14:00 GMT
Hello,

today I tried to run a new crunch job on our production cluster and it
crashed with a NumberFormatException in file CrunchCombineFileInputFormat
<https://github.com/apache/crunch/blob/apache-crunch-0.12/crunch-core/src/main/java/org/apache/crunch/impl/mr/run/CrunchCombineFileInputFormat.java>
at
line 38 where the configuration is queried for a key named "dfs.block.size".
The crash I experienced was caused by the fact that I used a non-long value
"128m" as the default block size. I looked at the official hadoop
documentation
<https://hadoop.apache.org/docs/r2.6.0/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml>
and
I discovered that such abbreviations should be supported.

Furthermore, the preferred name of the configuration key changed from
"dfs.block.size" to  "dfs.blocksize" (see
https://issues.apache.org/jira/browse/HDFS-631). The default value in the
source code hides the fact that the configuration key is not found in the
conf in more recent hadoop versions.

I am using crunch 0.11 that comes with CDH 5.4.5. The problematic line
first appeared in crunch 0.8 as a fix to issue CRUNCH-253.

The issue can be fixed by taking both config key names into account and
writing some logic to support block size abbreviations. Or, as a
workaround, file size abbreviations can be avoided on the client.

Do you think that this should be fixed or is the workaround enough? What
should I do if I want to submit a patch?

Thanks,
Tomas Cechal

Mime
View raw message