accumulo-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Christopher Tubbs (JIRA)" <>
Subject [jira] [Commented] (ACCUMULO-508) Multi-range input format
Date Wed, 23 Apr 2014 16:59:15 GMT


Christopher Tubbs commented on ACCUMULO-508:

Doesn't the AccumuloInputFormat already do this? I'm pretty sure there's an autoAdjust feature
that merges overlapping ranges, splits ranges on tablet boundaries, and then assigns them.
This is done by default. If this feature is turned off, the ranges are given to mappers exactly
as they were given to the job: 1 range per mapper.

> Multi-range input format
> ------------------------
>                 Key: ACCUMULO-508
>                 URL:
>             Project: Accumulo
>          Issue Type: New Feature
>          Components: client
>            Reporter: John Vines
>              Labels: mapreduce, newbie
> Maybe for 1.4.1.
> Our current input format will always apply one range (potentially split at tablet boundaries)
per mapper. This is great for situations where you have a few larger ranges. However, there
is a potential use case for many small ranges. Aside from the problem with a large job configuration
(ACCUMULO-507), this will result in a LOT of mappers doing very little work. We should have
an expanded input format which will bundle ranges together to a single mapper, ideally while
trying to maintain locality. This will optimize jobs with a lot of ranges by reducing the
amount of mapper overhead involved. I think very little will change with the RecordReader.
The onus should still go to the end user to detect when a range change has been made (via
Key change), so it will still emit Key/Value pairs, just like the regular input format.
> This could possibly be extended to the whole row input format as well.

This message was sent by Atlassian JIRA

View raw message