accumulo-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Corey J. Nolet (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (ACCUMULO-2553) AccumuloFileOutputFormat should be able to support output for multiple tables.
Date Fri, 28 Mar 2014 23:53:14 GMT

    [ https://issues.apache.org/jira/browse/ACCUMULO-2553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13951604#comment-13951604
] 

Corey J. Nolet commented on ACCUMULO-2553:
------------------------------------------

I've created a GroupedKeyRangePartitioner that will allow the user to specify multiple splits
files along with a group for each one. Currently, it expects a GroupedKey object to be emitted
from the mapper (a GroupedKey is a writable with a String/Text for a group and an o.a.a.core.data.Key)
where it pulls the splits file out of the configuration based on the given group to determine
the partition. The number of bins are based on the sum of all the split points to guarantee
each file written is done in its own reducer.

This paradigm seems in line with the MultipleOutputs class, where the group in the GroupedKey
can also be linked to the ultimate path for the output file. I am in the process of testing
the MultipleOutputs solution. I think it should be added to the examples when complete.

> AccumuloFileOutputFormat should be able to support output for multiple tables.
> ------------------------------------------------------------------------------
>
>                 Key: ACCUMULO-2553
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-2553
>             Project: Accumulo
>          Issue Type: New Feature
>            Reporter: Corey J. Nolet
>            Priority: Minor
>
> This may not necessarily be something that would require changes in the AccumuloFileOutputFormat
itself. Perhaps the ability to use it with Hadoop's MultipleOutputs is really the solution.
> It would be useful if the user could specify multiple directories where RFiles should
be placed and have a mechanism for populating the RFiles in the necessary directories based
on a table name or group name. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message