accumulo-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Billie Rinaldi (JIRA)" <>
Subject [jira] [Commented] (ACCUMULO-507) Large amount of ranges can prevent job from kicking off
Date Fri, 19 Apr 2013 17:39:16 GMT


Billie Rinaldi commented on ACCUMULO-507:

I think this fix was reverted for the wrong reason.  ACCUMULO-826 found that a MapReduce job
would fail if the process that started the job was killed.  This was an issue because we were
writing the user's password to a file that was being deleted on exit.  Whenever a new map
task is kicked off it needs to read the password, so it was trying to read a nonexistent file.
 But the ranges don't need to be read by each map task, they only need to be accessed once
when getSplits is called, which happens before the job is actually submitted.  Thus it shouldn't
matter if the file containing the ranges is deleted in the middle of a job -- if the process
exits before the job is actually submitted, the job will fail, but that seems OK to me.

The other issue pointed out in ACCUMULO-826 is valid, that the file was being written to the
file system, added to the distributed cache, then read directly from the file system.  The
ranges file shouldn't have been added to the distributed cache at all, since it's not needed
by the slave nodes.

However, there may be little point in re-applying this fix if the mapred.user.jobconf.limit
applies to the whole job submit directory.  Using the ranges file method might effectively
halve the size of the job submit directory, but you could still hit the limit if you had enough
ranges.  I guess I'll try to verify this is true.  Does anyone have opinions about this issue?
> Large amount of ranges can prevent job from kicking off
> -------------------------------------------------------
>                 Key: ACCUMULO-507
>                 URL:
>             Project: Accumulo
>          Issue Type: Improvement
>          Components: client
>    Affects Versions: 1.3.5
>            Reporter: John Vines
>            Assignee: Billie Rinaldi
>            Priority: Minor
>              Labels: mapreduce
> We use the various ranges a user provides to create splits. Those get read when the job
is submitted by the client. On the client side, those ranges are used to get all of the splits,
and then the job is started. If the configuration is too large, the job will fail to submit
(this size is configurable, but that's besides the point). We should look into clearing the
ranges out of the jobconf if it's large to prevent this error, since at this point the ranges
are no longer needed in the configuration.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see:

View raw message