accumulo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Elser <josh.el...@gmail.com>
Subject Re: 1.6 to 1.7 performance regression
Date Wed, 07 Jun 2017 14:53:46 GMT
Great, 30% is definitely in the ballpark of what we'd expect.

No worries on finding time, of course. Thanks for the reply.

On 6/7/17 1:51 AM, Sean Busbey wrote:
> I read through all the internal notes I could find from back in that
> testing, and I don't see any mention of changing the durability
> settings on meta nor root.
> 
> So that's a plausible source for the perf hit. I don't know when I'll
> have time to run through some tests to verify.
> 
> On Tue, Jun 6, 2017 at 2:43 PM, Josh Elser <josh.elser@gmail.com> wrote:
>> (spinning off from the other thread)
>>
>> The backstory on Sean's testing can be found in [1]. Essentially, in his
>> testing, he observed some cases where there was an unexplained ~30%
>> performance impact.
>>
>> <quote
>> Batch write performance for Accumulo 1.7.2‐cdh5.5.0 shows a regression of up
>> to approximately 30 percent, depending on table shape, when compared to
>> Accumulo 1.6.0‐cdh5.1.4. The performance decrease is more severe for
>> exceptionally large cells (100k and larger) or exceptionally wide rows (10k
>> columns). Carefully consider the performance impact for your environment
>> when deciding to upgrade to Accumulo 1.7.2‐cdh5.5.0.
>> </quote>
>>
>> Since it came up again, I was hoping we could put this concern to rest,
>> chalking it up to the WAL flush/sync calls that changed between 1.6 and 1.7
>> as documented by our Keith[2]. Hopefully, Sean's notes are sufficient for us
>> to reconstruct his environment :)
>>
>> - Josh
>>
>> [1]
>> https://www.cloudera.com/documentation/other/accumulo/latest/PDF/Apache-Accumulo-Installation-Guide-1-7-2.pdf
>> [2] https://accumulo.apache.org/blog/2016/11/02/durability-performance.html
>>
>>
>> -------- Forwarded Message --------
>> Subject: Re: [DISCUSS] Question about 1.7 bugfix releases
>> Date: Tue, 6 Jun 2017 14:20:27 -0400
>> From: Josh Elser <josh.elser@gmail.com>
>> To: dev@accumulo.apache.org
>>
>> On 6/6/17 2:13 PM, Sean Busbey wrote:
>>>
>>> On Tue, Jun 6, 2017 at 12:07 PM, Josh Elser <josh.elser@gmail.com> wrote:
>>>>
>>>> On 6/6/17 12:39 PM, Sean Busbey wrote:
>>>>>
>>>>>
>>>>> For example, has anyone done perf comparisons between 1.7 and 1.8.z?
>>>>>
>>>>> When it came time for me to start telling folks that it was "safe" to
>>>>> upgrade to 1.7.z I ran into something like a 40-60% perf degradation
>>>>> on writes compared to 1.6 across the board. A little bit of this was
>>>>> already fixed in 1.8 at the time, but a substantial amount required a
>>>>> non-trivial refactoring because just no one had looked[1]. Even after
>>>>> all of that, I still had to caveat things because I still saw a
>>>>> ~15-30% perf drop on random writes in the presence of lots of columns.
>>>>
>>>>
>>>>
>>>> At a risk of de-railing otherwise good discussion on releases: do you
>>>> recall
>>>> if you had accounted for the following, Sean? (notably, the last code
>>>> snippet)
>>>>
>>>> https://accumulo.apache.org/blog/2016/11/02/durability-performance.html
>>>
>>>
>>> I know that "set durability to flush and not sync" was one of the
>>> parameters for the comparison, but I don't remember what was done
>>> specifically during the testing back in September, tbh.
>>>
>>> I can probably dig it out if you'd like; I think we were pretty good
>>> at keeping notes. Probably something for a different thread?
>>>
>>
>> Agreed. Just wanted to ask before I forgot again. Saw some relevance in the
>> worry of perf regressions 1.7->1.8 based on the existence of those you saw
>> 1.6->1.7, but def don't want to derail further here.
>>
>> If you have the time and the notes, would be happy to review.
> 
> 
> 

Mime
View raw message