accumulo-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Christopher Tubbs (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (ACCUMULO-4165) Create a user level API for RFile
Date Tue, 15 Mar 2016 20:28:33 GMT

    [ https://issues.apache.org/jira/browse/ACCUMULO-4165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15196174#comment-15196174
] 

Christopher Tubbs commented on ACCUMULO-4165:
---------------------------------------------

It looks that the API you're envisioning will require some understanding of how locality groups
are stored in RFiles. Have you considered omitting locality group support entirely, or just
using a single default locality group if none are started before the first key is appended?

For the factories, I'd assume the minimum information to provide is the filename? If so, should
it default to "file://" if it begins with a "/"?

For the parameters which take sizes, it'd be useful to be able to specify a string format,
like "20M" instead of 1024*1024*20 bytes.

Will this API be something that could be used internally to clean up some of our code which
uses RFiles? (I hope so.)

> Create a user level API for RFile
> ---------------------------------
>
>                 Key: ACCUMULO-4165
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-4165
>             Project: Accumulo
>          Issue Type: Improvement
>            Reporter: Keith Turner
>            Assignee: Keith Turner
>             Fix For: 1.8.0
>
>
> Users can bulk import RFiles.  Currently the only way users can create RFiles using Accumulo's
public API is via AccumuloFileOutputFormat.  There is no way to read RFiles in the public
API.   Also, the internal APIs for reading and writing RFiles are cumbersome to use.
> I am experimenting with a simple RFile API like the following.  Below is an example of
writing data.
> {code:java}
>     LocalFileSystem localFs = FileSystem.getLocal(new Configuration());
>     RFileWriter writer = RFileFactory.newWriter()
>                                        .withFileName("/tmp/test100M.rf")
>                                        .withFileSystem(localFs).build();
>     writer.startDefaultLocalityGroup();
>     for (int r = 0; r < 10000000; r++) {
>       for (int cq = 0; cq < 10; cq++) {
>         writer.append(genKey(r, cq), genVal(r, cq));
>       }
>     }
>     writer.close();
> {code}
> Below is an example of reading data.
> {code:java}
>     LocalFileSystem localFs = FileSystem.getLocal(new Configuration());
>     Scanner scanner = RFileFactory.newScanner()
>                                           .withFileName("/tmp/test100M.rf")
>                                           .withFileSystem(localFs)
>                                           .withDataCache(250000000)
>                                           .withIndexCache(1000000).build();
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message