accumulo-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Corey J. Nolet (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (ACCUMULO-2553) AccumuloFileOutputFormat should be able to support output for multiple tables.
Date Tue, 20 May 2014 01:52:40 GMT

    [ https://issues.apache.org/jira/browse/ACCUMULO-2553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14002705#comment-14002705
] 

Corey J. Nolet commented on ACCUMULO-2553:
------------------------------------------

[~kturner], I may also investigate having reducers that can process ranges for multiple groups
as well. I suppose that could cut down on the number of reducers needed. The group gets passed
into the reducer with the key (I have a GroupedKey class now that encapsulates the groupname
and the key) so I know which folder in which to write the file. Wondering if it'd be worth
trying to make the # of sub-bins independent too to help cut down on hotspots.

> AccumuloFileOutputFormat should be able to support output for multiple tables.
> ------------------------------------------------------------------------------
>
>                 Key: ACCUMULO-2553
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-2553
>             Project: Accumulo
>          Issue Type: New Feature
>            Reporter: Corey J. Nolet
>            Assignee: Corey J. Nolet
>            Priority: Minor
>
> This may not necessarily be something that would require changes in the AccumuloFileOutputFormat
itself. Perhaps the ability to use it with Hadoop's MultipleOutputs is really the solution.
> It would be useful if the user could specify multiple directories where RFiles should
be placed and have a mechanism for populating the RFiles in the necessary directories based
on a table name or group name. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message