accumulo-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Christopher Tubbs (JIRA)" <>
Subject [jira] [Updated] (ACCUMULO-508) Multi-range input format
Date Tue, 18 Dec 2012 18:34:15 GMT


Christopher Tubbs updated ACCUMULO-508:

    Fix Version/s:     (was: 1.5.0)
         Assignee: John Vines
         Reporter: John Vines  (was: jv)
> Multi-range input format
> ------------------------
>                 Key: ACCUMULO-508
>                 URL:
>             Project: Accumulo
>          Issue Type: New Feature
>          Components: client
>            Reporter: John Vines
>            Assignee: John Vines
>              Labels: mapreduce, newbie
> Maybe for 1.4.1.
> Our current input format will always apply one range (potentially split at tablet boundaries)
per mapper. This is great for situations where you have a few larger ranges. However, there
is a potential use case for many small ranges. Aside from the problem with a large job configuration
(ACCUMULO-507), this will result in a LOT of mappers doing very little work. We should have
an expanded input format which will bundle ranges together to a single mapper, ideally while
trying to maintain locality. This will optimize jobs with a lot of ranges by reducing the
amount of mapper overhead involved. I think very little will change with the RecordReader.
The onus should still go to the end user to detect when a range change has been made (via
Key change), so it will still emit Key/Value pairs, just like the regular input format.
> This could possibly be extended to the whole row input format as well.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see:

View raw message