couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jan Lehnardt <...@apache.org>
Subject Re: [VOTE] Apache CouchDB 1.2.0 release, second round
Date Sun, 26 Feb 2012 21:07:48 GMT
Bob,

thanks for your reply

I wasn't implying we should try to explain anything away. All of these are valid concerns,
I just wanted to get a better understanding on where the bit flips from +0 to -1 and subsequently,
how to address that boundary. Ideally we can just fix all of the things you mention, but I
think it is important to understand them in detail, that's why I was going into them. Ultimately,
I want to understand what we need to do to ship 1.2.0.

On Feb 26, 2012, at 21:22 , Bob Dionne wrote:

> Jan,
> 
> I'm -1 based on all of my evaluation. I've spent a few hours on this release now yesterday
and today. It doesn't really pass what I would call the "smoke test". Almost everything I've
run into has an explanation:
> 
> 1. crashes out of the box - that's R15B, you need to recompile SSL and Erlang (we'll
note on release notes)

Have we spent any time on figuring out what the trouble here is?


> 2. etaps hang running make check. Known issue. Our etap code is out of date, recent versions
of etap don't even run their own unit tests

I have seen the etap hang as well, and I wasn't diligent enough to report it in JIRA, I have
done so now (COUCHDB-1424).


> 3. Futon tests fail. Some are known bugs (attachment ranges in Chrome) . Both Chrome
and Safari also hang

Do you have more details on where Chrome and Safari hang? Can you try their private browsing
features, double/triple check that caches are empty? Can you get to a situation where you
get all tests succeeding across all browsers, even if individual ones fail on one or two others?


> 4. standalone JS tests fail. Again most of these run when run by themselves

Which ones?


> 5. performance. I used real production data *because* Stefan on user reported performance
degradation on his data set. Any numbers are meaningless for a single test. I also ran scripts
that BobN and Jason Smith posted that show a difference between 1.1.x and 1.2.x

You are conflating an IRC discussion we've had into this thread. The performance regression
reported is a good reason to look into other scenarios where we can show slowdowns. But we
need to understand what's happening. Just from looking at dev@ all I see is some handwaving
about some reports some people have done (Not to discourage any work that has been done on
IRC and user@, but for the sake of a release vote thread, this related information needs to
be on this mailing list).

As I said on IRC, I'm happy to get my hands dirty to understand the regression at hand. But
we need to know where we'd draw a line and say this isn't acceptable for a 1.2.0.


> 6. Reviewed patch pointed to by Jason that may be the cause but it's hard to say without
knowing the code analysis that went into the changes. You can see obvious local optimizations
that make good sense but those are often the ones that get you, without knowing the call counts.

That is a point that wasn't included in your previous mail. It's great that there is progress,
thanks for looking into this!


> Many of these issues can be explained away, but I think end users will be less forgiving.
I think we already struggle with view performance. I'm interested to see how others evaluate
this regression.
> I'll try this seatoncouch tool you mention later to see if I can construct some more
definitive tests.

Again, I'm not trying to explain anything away. I want to get a shared understanding of the
issues you raised and where we stand on solving them squared against the ongoing 1.2.0 release.

And again: Thanks for doing this thorough review and looking into performance issue. I hope
with your help we can understand all these things a lot better very soon :)

Cheers
Jan
-- 


> 
> Best,
> 
> Bob
> On Feb 26, 2012, at 2:29 PM, Jan Lehnardt wrote:
> 
>> 
>> On Feb 26, 2012, at 13:58 , Bob Dionne wrote:
>> 
>>> -1
>>> 
>>> R15B on OS X Lion
>>> 
>>> I rebuilt OTP with an older SSL and that gets past all the crashes (thanks Filipe).
I still see hangs when running make check, though any particular etap that hangs will run
ok by itself. The Futon tests never run to completion in Chrome without hanging and the standalone
JS tests also have fails.
>> 
>> What part of this do you consider the -1? Can you try running the JS tests in Firefox
and or Safari? Can you get all tests pass at least once across all browsers? The cli JS suite
isn't supposed to work, so that isn't a criterion. I've seen the hang in make check for R15B
while individual tests run as well, but I don't consider this blocking. While I understand
and support the notion that tests shouldn't fail, period, we gotta work with what we have
and master already has significant improvements. What would you like to see changed to not
-1 this release?
>> 
>>> I tested the performance of view indexing, using a modest 200K doc db with a
large complex view and there's a clear regression between 1.1.x and 1.2.x Others report similar
results
>> 
>> What is a large complex view? The complexity of the map/reduce functions is rarely
an indicator of performance, it's usually input doc size and output/emit()/reduce data size.
How big are the docs in your test and how big is the returned data? I understand the changes
for 1.2.x will improve larger-data scenarios more significantly.
>> 
>> Cheers
>> Jan
>> -- 
>> 
>> 
>> 
>> 
>>> 
>>> On Feb 23, 2012, at 5:25 PM, Bob Dionne wrote:
>>> 
>>>> sorry Noah, I'm in debug mode today so I don't care to start mucking with
my stack, recompiling erlang, etc...
>>>> 
>>>> I did try using that build repeatedly and it crashes all the time. I find
it very odd and I had seen those before as I said on my older macbook. 
>>>> 
>>>> I do see the hangs Jan describes in the etaps, they have been there right
along, so I'm confident this just the SSL issue. Why it only happens on the build is puzzling,
any source build of any branch works just peachy.
>>>> 
>>>> So I'd say I'm +1 based on my use of the 1.2.x branch but I'd like to hear
from Stefan, who reported the severe performance regression. BobN seems to think we can ignore
that, it's something flaky in that fellow's environment. I tend to agree but I'm conservative
>>>> 
>>>> On Feb 23, 2012, at 1:23 PM, Noah Slater wrote:
>>>> 
>>>>> Can someone convince me this bus error stuff and segfaults is not a
>>>>> blocking issue.
>>>>> 
>>>>> Bob tells me that he's followed the steps above and he's still experiencing
>>>>> the issues.
>>>>> 
>>>>> Bob, you did follow the steps to install your own SSL right?
>>>>> 
>>>>> On Thu, Feb 23, 2012 at 5:09 PM, Jan Lehnardt <jan@apache.org>
wrote:
>>>>> 
>>>>>> 
>>>>>> On Feb 23, 2012, at 00:28 , Noah Slater wrote:
>>>>>> 
>>>>>>> Hello,
>>>>>>> 
>>>>>>> I would like call a vote for the Apache CouchDB 1.2.0 release,
second
>>>>>> round.
>>>>>>> 
>>>>>>> We encourage the whole community to download and test these
>>>>>>> release artifacts so that any critical issues can be resolved
before the
>>>>>>> release is made. Everyone is free to vote on this release, so
get stuck
>>>>>> in!
>>>>>>> 
>>>>>>> We are voting on the following release artifacts:
>>>>>>> 
>>>>>>> http://people.apache.org/~nslater/dist/1.2.0/
>>>>>>> 
>>>>>>> 
>>>>>>> These artifacts have been built from the following tree-ish in
Git:
>>>>>>> 
>>>>>>> 4cd60f3d1683a3445c3248f48ae064fb573db2a1
>>>>>>> 
>>>>>>> 
>>>>>>> Please follow the test procedure before voting:
>>>>>>> 
>>>>>>> http://wiki.apache.org/couchdb/Test_procedure
>>>>>>> 
>>>>>>> 
>>>>>>> Thank you.
>>>>>>> 
>>>>>>> Happy voting,
>>>>>> 
>>>>>> Signature and hashes check out.
>>>>>> 
>>>>>> Mac OS X 10.7.3, 64bit, SpiderMonkey 1.8.0, Erlang R14B04: make check
>>>>>> works fine, browser tests in Safari work fine.
>>>>>> 
>>>>>> Mac OS X 10.7.3, 64bit, SpiderMonkey 1.8.5, Erlang R14B04: make check
>>>>>> works fine, browser tests in Safari work fine.
>>>>>> 
>>>>>> FreeBSD 9.0, 64bit, SpiderMonkey 1.7.0, Erlang R14B04: make check
works
>>>>>> fine, browser tests in Safari work fine.
>>>>>> 
>>>>>> CentOS 6.2, 64bit, SpiderMonkey 1.8.5, Erlang R14B04: make check
works
>>>>>> fine, browser tests in Firefox work fine.
>>>>>> 
>>>>>> Ubuntu 11.4, 64bit, SpiderMonkey 1.8.5, Erlang R14B02: make check
works
>>>>>> fine, browser tests in Firefox work fine.
>>>>>> 
>>>>>> Ubuntu 10.4, 32bit, SpiderMonkey 1.8.0, Erlang R13B03: make check
fails in
>>>>>> - 076-file-compression.t: https://gist.github.com/1893373
>>>>>> - 220-compaction-daemon.t: https://gist.github.com/1893387
>>>>>> This on runs in a VM and is 32bit, so I don't know if there's anything
in
>>>>>> the tests that rely on 64bittyness or the R14B03. Filipe, I think
you
>>>>>> worked on both features, do you have an idea?
>>>>>> 
>>>>>> I tried running it all through Erlang R15B on Mac OS X 1.7.3, but
a good
>>>>>> way into `make check` the tests would just stop and hang. The last
time,
>>>>>> repeatedly in 160-vhosts.t, but when run alone, that test finished
in under
>>>>>> five seconds. I'm not sure what the issue is here.
>>>>>> 
>>>>>> Despite the things above, I'm happy to give this a +1 if we put a
warning
>>>>>> about R15B on the download page.
>>>>>> 
>>>>>> Great work all!
>>>>>> 
>>>>>> Cheers
>>>>>> Jan
>>>>>> --
>>>>>> 
>>>>>> 
>>>> 
>>> 
>> 
> 


Mime
View raw message