accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Keith Turner <>
Subject Re: pre-sorting row keys vs not pre-sorting row keys
Date Thu, 29 Oct 2015 21:45:02 GMT
I think the batch writer does sort mutations to bin them by tablet.

Did you consider JIT in your testing?  If one part of the test ran after
JIT it would be much faster because of that.

Also are you measuring the sort time and adding that to the test where you
pass sorted data?

On Thu, Oct 29, 2015 at 3:30 PM, Ara Ebrahimi <>

> Hi,
> We just did a simple test:
> - insert 10k batches of columns
> - sort the same 10k batch based on row keys and insert
> So basically the batch writer in the first test has items in non-sorted
> order and in the second one in sorted order. We noticed 50% better
> performance in the sorted version! Why is that the case? Is this something
> we need to consider doing for live ingest scenarios?
> Thanks,
> Ara.
> ________________________________
> This message is for the designated recipient only and may contain
> privileged, proprietary, or otherwise confidential information. If you have
> received it in error, please notify the sender immediately and delete the
> original. Any other use of the e-mail by you is prohibited. Thank you in
> advance for your cooperation.
> ________________________________

View raw message