accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Huanchen Zhang <iamzhan...@gmail.com>
Subject Re: Difference between InsertWithBatchWriter and InsertWithOutputFormat
Date Thu, 18 Oct 2012 20:35:09 GMT
Hello, Corey

None of the two examples uses BatchWriter and contex.write in the same job.

Data consistency is a good point. I need to rethink about the my task.

Thank you ! It really helps.

Best,
Huanchen

On Oct 16, 2012, at 11:55 PM, Corey Nolet wrote:

> Huanchen,
> 
> The AccumuloOutputFormat just passes along the connection information (i.e. username,
password, instance, zookeepers) so that an Accumulo connector can be created in each output
worker (that is, each mapper or reducer). You could do this on your own by passing the connection
information around in the Configuration() and creating the BatchWriter in the mappers (map-only
job) or the reducer and then use your HDFS output format to emit the data elsewhere.
> 
> I have not looked at these examples but I'm assuming they are doing the same thing? Though
I haven't tried this myself, I can't see why it wouldn't work. When having 2 output endpoints,
you will most likely want to think about a strategy to deal with a successful Accumulo write
but a failure in writing to HDFS- if data consistency is something you need to guarantee.
> 
> 
> Corey
> 
> On Oct 16, 2012, at 10:48 PM, Huanchen Zhang wrote:
> 
>> Hello,  Corey
>> 
>> Thank you for your answer.
>> 
>> Can I use InsertWithBatchWriter for this task ? I mean, use context.write to write
to hdfs, use batchwriter.addMutation to write to accumulo.
>> 
>> Huanchen
>> 
>> On Oct 16, 2012, at 10:25 PM, Corey Nolet wrote:
>> 
>>> You can extend the output format to write to both and have the resulting record
writer underneath write to the correct endpoint depending on the items submitted from the
job.
>>> 
>>> 
>>> 
>>> 
>>> 
>>> On Oct 16, 2012, at 10:16 PM, Huanchen Zhang wrote:
>>> 
>>>> Hello,
>>>> 
>>>> Hese I have a mapreduce job which needs to write to accumulo. I checked the
examples. It seems there are two different ways to write to accumulo, one is InsertWithBatchWriter,
one is InsertWithOutputFormat.
>>>> 
>>>> So, what is the difference of them ? Which one should I choose ?
>>>> 
>>>> I actually need to write to accumulo and hdfs in the same job. I seems InsertWithOutputFormat
cannot do this, because it needs to set the output format as "AccumuloOutputFormat.class",
and can only write to accumulo in one job, right ?
>>>> 
>>>> Thank you.
>>>> 
>>>> Best,
>>>> Huanchen
>>> 
>> 
> 


Mime
View raw message