accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From William Slacum <wilhelm.von.cl...@accumulo.net>
Subject Re: Set AccumuloFileOutputFormat to save data to HDFS files instead of writing to Accumulo directly
Date Wed, 18 Jun 2014 14:32:18 GMT
It extends FileOutputFormat, which provides that method (I haven't fully
investigated java 7 nor 8, has its handling of parent class static methods
changed?):

http://hadoop.apache.org/docs/r2.3.0/api/org/apache/hadoop/mapred/FileOutputFormat.html

Data written to RFiles has to be sorted.

With partitioning, as you grow in scale, it's a good idea to use a table's
splits (see TableOperations#listSplits) to determine how to partition the
output among files.

Note that you don't write a <Text, Mutation> pair, but a <Key, Value> pair
to the files.


On Wed, Jun 18, 2014 at 4:21 AM, Jianshi Huang <jianshi.huang@gmail.com>
wrote:

> Hi all,
>
> I saw this line in
> accumulo-1.6.0/examples/simple/src/main/java/org/apache/accumulo/examples/simple/mapreduce/bulk/BulkIngestExample.java
>
>   AccumuloFileOutputFormat.setOutputPath(job, new Path(opts.workDir +
> "/files"));
>
> However, it seems setOutputPath is not in 1.6.0
>
>
> http://accumulo.apache.org/1.6/apidocs/org/apache/accumulo/core/client/mapreduce/AccumuloOutputFormat.html
>
>
> So how can I write the mutations to HDFS files? I think it might be faster
> to import using importdirectory command.
>
> BTW, I think it's fastest to import if the mutation files are already
> (partially) sorted and partitioned, makes sense?
>
> --
> Jianshi Huang
>
> LinkedIn: jianshi
> Twitter: @jshuang
> Github & Blog: http://huangjs.github.com/
>

Mime
View raw message