accumulo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aaron Cordova <aa...@cordovas.org>
Subject Re: [jira] [Created] (ACCUMULO-454) RFile Input Format
Date Fri, 09 Mar 2012 17:31:32 GMT
Does Keith's input format apply the necessary Accumulo iterators to provide a sane view of
the data to MapReduce?

And what you're proposing is an input format that works over RFiles where perhaps multiple
versions of the same row/column don't exist in multiple files and where there are no delete
markers, etc?

On Mar 9, 2012, at 12:10 PM, John Vines (Created) (JIRA) wrote:

> RFile Input Format
> ------------------
> 
>                 Key: ACCUMULO-454
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-454
>             Project: Accumulo
>          Issue Type: New Feature
>          Components: client
>            Reporter: John Vines
>            Assignee: Billie Rinaldi
>             Fix For: 1.4.1
> 
> 
> We currently provide InputFormats for reading from Accumulo and output formats for both
direct input as well as outputting RFiles. But we provide no mechanism for doing a mapreduce
over existing RFiles, which may be useful for optimizing data flow. We already have input
formats which use RFiles directly for input (The offline input format Keith just finished),
but that still relies on the Accumulo structure. We should go ahead and also create an input
format that just hits RFiles like the other standard file input formats.
> 
> --
> This message is automatically generated by JIRA.
> If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
> For more information on JIRA, see: http://www.atlassian.com/software/jira
> 
> 


Mime
View raw message