accumulo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Elser <josh.el...@gmail.com>
Subject Re: using combiner vs. building stats cache
Date Fri, 28 Aug 2015 16:37:17 GMT
> Thanks for the tips Josh. We are using BatchWriter, so it should perform
> better throughput. But I just looked at the code, and it looks like we call
> batchWriter.flush() after each addMutation call. This doesn't seem a good
> utilization of batch writer...
> I am curious on how normally people batch the insert/update? The process may
> crash, and we'll lose those changes unfortunately :-(

Yes, any mutations sent before flush() (or close()) successfully returns 
might not be durable. You would need to have some logic in your 
application to work with this constraint. Hard to give recommendations 
on how to work with this without knowing your workflow. Using a combiner 
is slightly more difficult as sending the same mutation multiple times 
will make your stats incorrect.

Our failure condition could be nicer in this case (ideally, providing 
you the mutations that weren't applied), but that's something that would 
have to be implemented on our side (and no one is working on that 
presently to my knowledge). I'm not sure if there's something easy we 
could do that would make this failure handling easier for you.

Mime
View raw message