accumulo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "John Vines (Created) (JIRA)" <>
Subject [jira] [Created] (ACCUMULO-508) Multi-range input format
Date Mon, 02 Apr 2012 14:49:23 GMT
Multi-range input format

                 Key: ACCUMULO-508
             Project: Accumulo
          Issue Type: New Feature
          Components: client
            Reporter: John Vines
             Fix For: 1.5.0

Maybe for 1.4.1.

Our current input format will always apply one range (potentially split at tablet boundaries)
per mapper. This is great for situations where you have a few larger ranges. However, there
is a potential use case for many small ranges. Aside from the problem with a large job configuration
(ACCUMULO-507), this will result in a LOT of mappers doing very little work. We should have
an expanded input format which will bundle ranges together to a single mapper, ideally while
trying to maintain locality. This will optimize jobs with a lot of ranges by reducing the
amount of mapper overhead involved. I think very little will change with the RecordReader.
The onus should still go to the end user to detect when a range change has been made (via
Key change), so it will still emit Key/Value pairs, just like the regular input format.

This could possibly be extended to the whole row input format as well.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:!default.jspa
For more information on JIRA, see:


View raw message