crunch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gabriel Reid <>
Subject Re: NumberFormatException when parsing dfs.block.size
Date Tue, 03 Nov 2015 09:52:16 GMT
Hi Tomas,

Nice catch! And thanks for all the details.

That does look like something that should be fixed properly (both the
parsing of the config value and the use of the new config property).
It looks like dfs.blocksize has been the "correct" name of the
property for quite a while, so it's probably ok to just use
"dfs.blocksize" now.

About the parsing, the Configuration class actually includes a method
(getLongBytes) which does this parsing for you.

That would be great if you would be up for submitting a patch. The
steps to follow are:
1. Log a JIRA ticket on the Crunch project here:
2. Create the patch
3. Upload the patch to your created JIRA ticket

There is more detail on this process here:

- Gabriel

On Tue, Nov 3, 2015 at 10:14 AM, Tomáš Čechal <> wrote:
> Hello,
> today I tried to run a new crunch job on our production cluster and it
> crashed with a NumberFormatException in file CrunchCombineFileInputFormat at
> line 38 where the configuration is queried for a key named "dfs.block.size".
> The crash I experienced was caused by the fact that I used a non-long value
> "128m" as the default block size. I looked at the official hadoop
> documentation and I discovered that such abbreviations should be supported.
> Furthermore, the preferred name of the configuration key changed from
> "dfs.block.size" to  "dfs.blocksize" (see
> The default value in the
> source code hides the fact that the configuration key is not found in the
> conf in more recent hadoop versions.
> I am using crunch 0.11 that comes with CDH 5.4.5. The problematic line first
> appeared in crunch 0.8 as a fix to issue CRUNCH-253.
> The issue can be fixed by taking both config key names into account and
> writing some logic to support block size abbreviations. Or, as a workaround,
> file size abbreviations can be avoided on the client.
> Do you think that this should be fixed or is the workaround enough? What
> should I do if I want to submit a patch?
> Thanks,
> Tomas Cechal

View raw message