accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Corey Nolet <cno...@texeltek.com>
Subject Re: Difference between InsertWithBatchWriter and InsertWithOutputFormat
Date Wed, 17 Oct 2012 03:55:56 GMT
Huanchen,

The AccumuloOutputFormat just passes along the connection information (i.e. username, password,
instance, zookeepers) so that an Accumulo connector can be created in each output worker (that
is, each mapper or reducer). You could do this on your own by passing the connection information
around in the Configuration() and creating the BatchWriter in the mappers (map-only job) or
the reducer and then use your HDFS output format to emit the data elsewhere.

I have not looked at these examples but I'm assuming they are doing the same thing? Though
I haven't tried this myself, I can't see why it wouldn't work. When having 2 output endpoints,
you will most likely want to think about a strategy to deal with a successful Accumulo write
but a failure in writing to HDFS- if data consistency is something you need to guarantee.


Corey

On Oct 16, 2012, at 10:48 PM, Huanchen Zhang wrote:

> Hello,  Corey
> 
> Thank you for your answer.
> 
> Can I use InsertWithBatchWriter for this task ? I mean, use context.write to write to
hdfs, use batchwriter.addMutation to write to accumulo.
> 
> Huanchen
> 
> On Oct 16, 2012, at 10:25 PM, Corey Nolet wrote:
> 
>> You can extend the output format to write to both and have the resulting record writer
underneath write to the correct endpoint depending on the items submitted from the job.
>> 
>> 
>> 
>> 
>> 
>> On Oct 16, 2012, at 10:16 PM, Huanchen Zhang wrote:
>> 
>>> Hello,
>>> 
>>> Hese I have a mapreduce job which needs to write to accumulo. I checked the examples.
It seems there are two different ways to write to accumulo, one is InsertWithBatchWriter,
one is InsertWithOutputFormat.
>>> 
>>> So, what is the difference of them ? Which one should I choose ?
>>> 
>>> I actually need to write to accumulo and hdfs in the same job. I seems InsertWithOutputFormat
cannot do this, because it needs to set the output format as "AccumuloOutputFormat.class",
and can only write to accumulo in one job, right ?
>>> 
>>> Thank you.
>>> 
>>> Best,
>>> Huanchen
>> 
> 


Mime
View raw message