accumulo-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Josh Elser (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (ACCUMULO-4165) Create a user level API for RFile
Date Sat, 19 Mar 2016 19:04:33 GMT

    [ https://issues.apache.org/jira/browse/ACCUMULO-4165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15202920#comment-15202920
] 

Josh Elser commented on ACCUMULO-4165:
--------------------------------------

bq. For the factories, I'd assume the minimum information to provide is the filename? If so,
should it default to "file://" if it begins with a "/"?

If the default FileSystem is against the local filesystem, it seems reasonable to just always
create a Path using that FileSystem (deferring to FileSystem to "localize" it, or tell us
if it already has the wrong scheme?).

bq. For the parameters which take sizes, it'd be useful to be able to specify a string format,
like "20M" instead of 1024*1024*20 bytes.

IMO, I think accepting a long representing bytes is fine, but :shrug:. I just think it's pretty
easy for someone to make a constant in their code for certain numbers.

> Create a user level API for RFile
> ---------------------------------
>
>                 Key: ACCUMULO-4165
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-4165
>             Project: Accumulo
>          Issue Type: Improvement
>            Reporter: Keith Turner
>            Assignee: Keith Turner
>             Fix For: 1.8.0
>
>
> Users can bulk import RFiles.  Currently the only way users can create RFiles using Accumulo's
public API is via AccumuloFileOutputFormat.  There is no way to read RFiles in the public
API.   Also, the internal APIs for reading and writing RFiles are cumbersome to use.
> I am experimenting with a simple RFile API like the following.  Below is an example of
writing data.
> {code:java}
>     LocalFileSystem localFs = FileSystem.getLocal(new Configuration());
>     RFileWriter writer = RFileFactory.newWriter()
>                                        .withFileName("/tmp/test100M.rf")
>                                        .withFileSystem(localFs).build();
>     writer.startDefaultLocalityGroup();
>     for (int r = 0; r < 10000000; r++) {
>       for (int cq = 0; cq < 10; cq++) {
>         writer.append(genKey(r, cq), genVal(r, cq));
>       }
>     }
>     writer.close();
> {code}
> Below is an example of reading data.
> {code:java}
>     LocalFileSystem localFs = FileSystem.getLocal(new Configuration());
>     Scanner scanner = RFileFactory.newScanner()
>                                           .withFileName("/tmp/test100M.rf")
>                                           .withFileSystem(localFs)
>                                           .withDataCache(250000000)
>                                           .withIndexCache(1000000).build();
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message