Mailing-List: contact dev-help@couchdb.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@couchdb.apache.org
Received-SPF: neutral (nike.apache.org: local policy)
Content-Type: text/plain; charset=iso-8859-1
Mime-Version: 1.0 (Apple Message framework v1257)
Subject: Re: [VOTE] Apache CouchDB 1.2.0 release, second round
From: Jan Lehnardt <jan@apache.org>
In-Reply-To: <BC6C3914-5586-4EE4-9C3F-3CCBE37B4BCE@dionne-associates.com>
Date: Wed, 29 Feb 2012 17:23:40 +0100
Content-Transfer-Encoding: quoted-printable
Message-Id: <E577BFB1-9CA4-4A35-8E73-8D371D52E4DA@apache.org>
References: 
 <CA+Y+4475J_wPbiC=g2R6CcqUfQ-_V6TTTxV2iS4xTbz9a10+Xw@mail.gmail.com>
 <7B1F0622-1915-423B-B87C-B0B7B0F1DF59@apache.org>
 <CA+Y+446fB7=HEr4qbXQYu2We5E+caXVOx=d59JGifJ=brTRj=w@mail.gmail.com>
 <E100A5F0-8F65-461E-B01F-4E5ACD4742B8@dionne-associates.com>
 <9369133A-E3FA-48A6-B6A4-F6ADE6486A06@dionne-associates.com>
 <04AC882B-F084-48DB-BF8F-39BFEC158899@apache.org>
 <8FDD55B6-A244-4E10-BC3F-CC42EADDA9A7@dionne-associates.com>
 <85846B78-84B0-47F0-8C9F-28A2326EFB29@apache.org>
 <5E39335F-177A-4B1B-AC8D-57500CF09092@dionne-associates.com>
 <8B5F61DD-C5FA-49A0-A369-91513A1A342D@apache.org>
 <BC6C3914-5586-4EE4-9C3F-3CCBE37B4BCE@dionne-associates.com>
To: dev@couchdb.apache.org

Hi Bob,

thanks for your reply. I think this is a bit of a convoluted situation =
now and I want to apologise for anything that might have upset you or =
anyone here. I didn't mean to be insulting or anything.

I think the 1.2.0 release is very important for this community and the =
project, so I am very interested in getting it out. I'm not interested =
in getting it out at all costs but I do want to clarify what's needed to =
do the release.

In the process, your -1 turned out to trigger a hold on the 1.2.0 =
release, even though you write you didn't intend that. Also in the =
process I confused your -1 to ask for the hold and as a result I got =
annoyed that you didn't provide more information. I also asked for =
clarification on the other issues you raised (e.g. running the test =
suite in Firefox/across multiple browsers) and you haven't responded to =
yet (I understand time constraints, but I'd have appreciated a "gonna =
work on it" sort of quick response).

I'm sorry I got annoyed and let it out on you.

All I want is to understand what it takes to release 1.2.0 and we need =
all your (and everybody's) support to get there. So thanks for any =
contributions to this discussion.

Cheers
Jan
--=20


On Feb 29, 2012, at 16:27 , Bob Dionne wrote:

> Jan,
>=20
> Sorry it took me a while to respond to this. As I said the db I was =
testing with is about ~200K docs, with one large ddoc, about 10 views. =
It takes a considerable long time to index, both on 1.1.x and 1.2.x but =
on 1.2.x it had not made the same amount of progress after 45 minutes on =
1.2.x as did the 1.1.x run after 20 minutes, so I assumed that something =
has slowed, 40% maybe, that's likely a number I pulled out of my hat =
from the chat and reports of others.
>=20
> I understand this is hand-wavy, which is why I was reluctant to jump =
to conclusions. Please note that I did not state we should stop this =
release, nor did I state that any of the issues I enumerated were =
showstoppers. I stated, with a -1 vote that the overall experience led =
me to conclude the release was not sound. It's one vote and I always am =
happy to go with the majority.
>=20
> I thought we were off to a good start on Sunday when I was digging in =
here. Yes it's off topic as you and Noah agreed, but once you've pushed =
this out you might want to revisit the release process. This =
slow_couchdb is definitely a good smoke test, I like it, but beyond that =
it's sort of an instance of "You're doing it all wrong" when it comes to =
performance. I'll be happy to elaborate on this when we get to that.=20
>=20
> Best Regards,
>=20
> Bob
>=20
>=20
> On Feb 27, 2012, at 8:15 AM, Jan Lehnardt wrote:
>=20
>>=20
>> On Feb 27, 2012, at 12:58 , Bob Dionne wrote:
>>=20
>>> Thanks for the clarification. I hope I'm not conflating things by =
continuing the discussion here, I thought that's what you requested?=20
>>=20
>> The discussion we had on IRC was regarding collecting more data items =
for the performance regression before we start to draw conclusions.
>>=20
>> My intention here is to understand what needs doing before we can =
release 1.2.0.
>>=20
>> I'll reply inline for the other issues.
>>=20
>>> I just downloaded the release candidate again to start fresh. "make =
distcheck" hangs on this step:
>>>=20
>>> =
/Users/bitdiddle/Downloads/apache-couchdb-1.2.0/apache-couchdb-1.2.0/_buil=
d/../test/etap/150-invalid-view-seq.t ......... 6/?=20
>>>=20
>>> Just stops completely. This is on R15B which has been rebuilt to use =
the recommended older SSL version. I haven't looked into this crashing =
too closely but I'm suspicious that I only see it with couchdb and never =
with bigcouch and never using the 1.2.x branch from source or any branch =
for that matter
>>=20
>> =46rom the release you should run `make check`, not make distcheck. =
But I assume you see a hang there too, as I have and others (yet not =
everybody), too. I can't comment on BigCouch and what is different =
there. It is interesting that 1.2.x won't hang. For me, `make check` in =
1.2.x on R15B hangs sometimes, in different places. I'm currently trying =
to gather more information about this.
>>=20
>> The question here is whether `make check` passing in R15B is a =
release requirement. In my vote I considered no, but I am happy to go =
with a community decision if it emerges. What is your take here?
>>=20
>> In addition, this just shouldn't be a question, so we should =
investigate why this happens at all and address the issue, hence =
COUCHDB-1424. Any insight here would be appreciated as well.
>>=20
>>=20
>>> In the command line tests, 2,7, 27, and 32 fail. but it differs from =
run to run.
>>=20
>> I assume you mean the JS tests. Again, this isn't supposed to work in =
1.2.x. I'm happy to backport my changes from master to 1.2.x to make =
that work, but I refrained from that because I didn't want to bring too =
much change to a release branch. I'm happy to reconsider, but I don't =
think a release vote is a good place to discuss feature backports.
>>=20
>>=20
>>> On Chrome attachment_ranges fails and it hangs on replicator_db
>>=20
>> This one is an "explaining away", but I think it is warranted. Chrome =
is broken for attachment_ranges. I don't know if we reported this =
upstream (Robert N?), but this isn't a release blocker. For the =
replicator_db test, can you try running that in other browsers. I =
understand it is not the best of situation (hence the move to the cli =
test suite for master), but if you get this test to pass in at least one =
other browsers, this isn't a problem that holds 1.2.x.
>>=20
>>=20
>>> With respect to performance I think comparisons with 1.1.x are =
important. I think almost any use case, contrived or otherwise should =
not be dismissed as a pathological or edge case. Bob's script is as =
simple as it gets and to me is a great smoke test. We need to figure out =
the reason 1.2 is clearly slower in this case. If there are specific =
scenarios that 1.2.x is optimized for then we should document that and =
provide reasons for the trade-offs
>>=20
>> I want to make absolutely clear that I take any report of performance =
regression very seriously. But I'm rather annoyed that no information =
about this ends up on dev@. I understand that on IRC there's some shared =
understanding of a few scenarios where performance regressions can be =
shown. I asked three times now that these be posted to this mailing =
list. I'm not asking for a comprehensive report, but anything really. I =
found Robert Newson's simple test script on IRC and ran that to test a =
suspicion of mine which I posted in an earlier mail (tiny docs -> =
slower, bigger docs -> faster). Nobody else bothered to post this here. =
I see no discussion about what is observed, what is expected, what would =
be acceptable for a release of 1.2.0 as is and what not.
>>=20
>> As far as this list is concerned, we know that a few people claimed =
that things are slower and it's very real and that we should hold the =
1.2.0 release for it. I'm more than happy to hold the release until we =
figured out the things I asked for above and help out figuring it all =
out. But we need something to work with here.
>>=20
>> I also understand that this is a voluntary project and people don't =
have infinite time to spend, but at least a message of "we're collecting =
things, will report when done", would be *great* to start. So far we =
only have a "hold the horses, there might be a something going on".
>>=20
>> Please let me know if this request is unreasonable or whether I am =
overreacting.
>>=20
>> Sorry for the rant.
>>=20
>> To anyone who has been looking into performance regression, can you =
please send to this list any info you have? If you have a comprehensive =
analysis, awesome, if you just ran some script on a machine, just send =
us that, let's collect all the data to get this situation solved! We =
need your help.
>>=20
>>=20
>> tl;dr:
>>=20
>> There's three issues at hand:
>>=20
>> - Robert D -1'd a release artefact. We want to understand what needs =
to happen to make a release. This includes assessing the issues he =
raises and squaring them against the release vote.
>>=20
>> - There's a vague (as far as dev@ is concerned) report about a =
performance regression. We need to get behind that.
>>=20
>> - There's been a non-dev@ discussion about the performance regression =
and that is referenced to influence a dev@ decision. We need that =
discussion's information on dev@ to proceed.
>>=20
>>=20
>> And to make it absolutely clear again. The performance regression =
*is* an issue and I am very grateful for the people, including Robert =
Newson, Robert Dionne and Jason Smith, who look into it. It's just that =
we need to treat this as an issue and get all this info onto dev@ or =
into JRIA.
>>=20
>>=20
>> Cheers
>> Jan
>> --=20
>>=20
>>=20
>>=20
>>>=20
>>> Cheers,
>>>=20
>>> Bob
>>>=20
>>>=20
>>> On Feb 26, 2012, at 4:07 PM, Jan Lehnardt wrote:
>>>=20
>>>> Bob,
>>>>=20
>>>> thanks for your reply
>>>>=20
>>>> I wasn't implying we should try to explain anything away. All of =
these are valid concerns, I just wanted to get a better understanding on =
where the bit flips from +0 to -1 and subsequently, how to address that =
boundary.
>>>=20
>>>=20
>>>=20
>>>=20
>>>> Ideally we can just fix all of the things you mention, but I think =
it is important to understand them in detail, that's why I was going =
into them. Ultimately, I want to understand what we need to do to ship =
1.2.0.
>>>>=20
>>>> On Feb 26, 2012, at 21:22 , Bob Dionne wrote:
>>>>=20
>>>>> Jan,
>>>>>=20
>>>>> I'm -1 based on all of my evaluation. I've spent a few hours on =
this release now yesterday and today. It doesn't really pass what I =
would call the "smoke test". Almost everything I've run into has an =
explanation:
>>>>>=20
>>>>> 1. crashes out of the box - that's R15B, you need to recompile SSL =
and Erlang (we'll note on release notes)
>>>>=20
>>>> Have we spent any time on figuring out what the trouble here is?
>>>>=20
>>>>=20
>>>>> 2. etaps hang running make check. Known issue. Our etap code is =
out of date, recent versions of etap don't even run their own unit tests
>>>>=20
>>>> I have seen the etap hang as well, and I wasn't diligent enough to =
report it in JIRA, I have done so now (COUCHDB-1424).
>>>>=20
>>>>=20
>>>>> 3. Futon tests fail. Some are known bugs (attachment ranges in =
Chrome) . Both Chrome and Safari also hang
>>>>=20
>>>> Do you have more details on where Chrome and Safari hang? Can you =
try their private browsing features, double/triple check that caches are =
empty? Can you get to a situation where you get all tests succeeding =
across all browsers, even if individual ones fail on one or two others?
>>>>=20
>>>>=20
>>>>> 4. standalone JS tests fail. Again most of these run when run by =
themselves
>>>>=20
>>>> Which ones?
>>>>=20
>>>>=20
>>>>> 5. performance. I used real production data *because* Stefan on =
user reported performance degradation on his data set. Any numbers are =
meaningless for a single test. I also ran scripts that BobN and Jason =
Smith posted that show a difference between 1.1.x and 1.2.x
>>>>=20
>>>> You are conflating an IRC discussion we've had into this thread. =
The performance regression reported is a good reason to look into other =
scenarios where we can show slowdowns. But we need to understand what's =
happening. Just from looking at dev@ all I see is some handwaving about =
some reports some people have done (Not to discourage any work that has =
been done on IRC and user@, but for the sake of a release vote thread, =
this related information needs to be on this mailing list).
>>>>=20
>>>> As I said on IRC, I'm happy to get my hands dirty to understand the =
regression at hand. But we need to know where we'd draw a line and say =
this isn't acceptable for a 1.2.0.
>>>>=20
>>>>=20
>>>>> 6. Reviewed patch pointed to by Jason that may be the cause but =
it's hard to say without knowing the code analysis that went into the =
changes. You can see obvious local optimizations that make good sense =
but those are often the ones that get you, without knowing the call =
counts.
>>>>=20
>>>> That is a point that wasn't included in your previous mail. It's =
great that there is progress, thanks for looking into this!
>>>>=20
>>>>=20
>>>>> Many of these issues can be explained away, but I think end users =
will be less forgiving. I think we already struggle with view =
performance. I'm interested to see how others evaluate this regression.
>>>>> I'll try this seatoncouch tool you mention later to see if I can =
construct some more definitive tests.
>>>>=20
>>>> Again, I'm not trying to explain anything away. I want to get a =
shared understanding of the issues you raised and where we stand on =
solving them squared against the ongoing 1.2.0 release.
>>>>=20
>>>> And again: Thanks for doing this thorough review and looking into =
performance issue. I hope with your help we can understand all these =
things a lot better very soon :)
>>>>=20
>>>> Cheers
>>>> Jan
>>>> --=20
>>>>=20
>>>>=20
>>>>>=20
>>>>> Best,
>>>>>=20
>>>>> Bob
>>>>> On Feb 26, 2012, at 2:29 PM, Jan Lehnardt wrote:
>>>>>=20
>>>>>>=20
>>>>>> On Feb 26, 2012, at 13:58 , Bob Dionne wrote:
>>>>>>=20
>>>>>>> -1
>>>>>>>=20
>>>>>>> R15B on OS X Lion
>>>>>>>=20
>>>>>>> I rebuilt OTP with an older SSL and that gets past all the =
crashes (thanks Filipe). I still see hangs when running make check, =
though any particular etap that hangs will run ok by itself. The Futon =
tests never run to completion in Chrome without hanging and the =
standalone JS tests also have fails.
>>>>>>=20
>>>>>> What part of this do you consider the -1? Can you try running the =
JS tests in Firefox and or Safari? Can you get all tests pass at least =
once across all browsers? The cli JS suite isn't supposed to work, so =
that isn't a criterion. I've seen the hang in make check for R15B while =
individual tests run as well, but I don't consider this blocking. While =
I understand and support the notion that tests shouldn't fail, period, =
we gotta work with what we have and master already has significant =
improvements. What would you like to see changed to not -1 this release?
>>>>>>=20
>>>>>>> I tested the performance of view indexing, using a modest 200K =
doc db with a large complex view and there's a clear regression between =
1.1.x and 1.2.x Others report similar results
>>>>>>=20
>>>>>> What is a large complex view? The complexity of the map/reduce =
functions is rarely an indicator of performance, it's usually input doc =
size and output/emit()/reduce data size. How big are the docs in your =
test and how big is the returned data? I understand the changes for =
1.2.x will improve larger-data scenarios more significantly.
>>>>>>=20
>>>>>> Cheers
>>>>>> Jan
>>>>>> --=20
>>>>>>=20
>>>>>>=20
>>>>>>=20
>>>>>>=20
>>>>>>>=20
>>>>>>> On Feb 23, 2012, at 5:25 PM, Bob Dionne wrote:
>>>>>>>=20
>>>>>>>> sorry Noah, I'm in debug mode today so I don't care to start =
mucking with my stack, recompiling erlang, etc...
>>>>>>>>=20
>>>>>>>> I did try using that build repeatedly and it crashes all the =
time. I find it very odd and I had seen those before as I said on my =
older macbook.=20
>>>>>>>>=20
>>>>>>>> I do see the hangs Jan describes in the etaps, they have been =
there right along, so I'm confident this just the SSL issue. Why it only =
happens on the build is puzzling, any source build of any branch works =
just peachy.
>>>>>>>>=20
>>>>>>>> So I'd say I'm +1 based on my use of the 1.2.x branch but I'd =
like to hear from Stefan, who reported the severe performance =
regression. BobN seems to think we can ignore that, it's something flaky =
in that fellow's environment. I tend to agree but I'm conservative
>>>>>>>>=20
>>>>>>>> On Feb 23, 2012, at 1:23 PM, Noah Slater wrote:
>>>>>>>>=20
>>>>>>>>> Can someone convince me this bus error stuff and segfaults is =
not a
>>>>>>>>> blocking issue.
>>>>>>>>>=20
>>>>>>>>> Bob tells me that he's followed the steps above and he's still =
experiencing
>>>>>>>>> the issues.
>>>>>>>>>=20
>>>>>>>>> Bob, you did follow the steps to install your own SSL right?
>>>>>>>>>=20
>>>>>>>>> On Thu, Feb 23, 2012 at 5:09 PM, Jan Lehnardt <jan@apache.org> =
wrote:
>>>>>>>>>=20
>>>>>>>>>>=20
>>>>>>>>>> On Feb 23, 2012, at 00:28 , Noah Slater wrote:
>>>>>>>>>>=20
>>>>>>>>>>> Hello,
>>>>>>>>>>>=20
>>>>>>>>>>> I would like call a vote for the Apache CouchDB 1.2.0 =
release, second
>>>>>>>>>> round.
>>>>>>>>>>>=20
>>>>>>>>>>> We encourage the whole community to download and test these
>>>>>>>>>>> release artifacts so that any critical issues can be =
resolved before the
>>>>>>>>>>> release is made. Everyone is free to vote on this release, =
so get stuck
>>>>>>>>>> in!
>>>>>>>>>>>=20
>>>>>>>>>>> We are voting on the following release artifacts:
>>>>>>>>>>>=20
>>>>>>>>>>> http://people.apache.org/~nslater/dist/1.2.0/
>>>>>>>>>>>=20
>>>>>>>>>>>=20
>>>>>>>>>>> These artifacts have been built from the following tree-ish =
in Git:
>>>>>>>>>>>=20
>>>>>>>>>>> 4cd60f3d1683a3445c3248f48ae064fb573db2a1
>>>>>>>>>>>=20
>>>>>>>>>>>=20
>>>>>>>>>>> Please follow the test procedure before voting:
>>>>>>>>>>>=20
>>>>>>>>>>> http://wiki.apache.org/couchdb/Test_procedure
>>>>>>>>>>>=20
>>>>>>>>>>>=20
>>>>>>>>>>> Thank you.
>>>>>>>>>>>=20
>>>>>>>>>>> Happy voting,
>>>>>>>>>>=20
>>>>>>>>>> Signature and hashes check out.
>>>>>>>>>>=20
>>>>>>>>>> Mac OS X 10.7.3, 64bit, SpiderMonkey 1.8.0, Erlang R14B04: =
make check
>>>>>>>>>> works fine, browser tests in Safari work fine.
>>>>>>>>>>=20
>>>>>>>>>> Mac OS X 10.7.3, 64bit, SpiderMonkey 1.8.5, Erlang R14B04: =
make check
>>>>>>>>>> works fine, browser tests in Safari work fine.
>>>>>>>>>>=20
>>>>>>>>>> FreeBSD 9.0, 64bit, SpiderMonkey 1.7.0, Erlang R14B04: make =
check works
>>>>>>>>>> fine, browser tests in Safari work fine.
>>>>>>>>>>=20
>>>>>>>>>> CentOS 6.2, 64bit, SpiderMonkey 1.8.5, Erlang R14B04: make =
check works
>>>>>>>>>> fine, browser tests in Firefox work fine.
>>>>>>>>>>=20
>>>>>>>>>> Ubuntu 11.4, 64bit, SpiderMonkey 1.8.5, Erlang R14B02: make =
check works
>>>>>>>>>> fine, browser tests in Firefox work fine.
>>>>>>>>>>=20
>>>>>>>>>> Ubuntu 10.4, 32bit, SpiderMonkey 1.8.0, Erlang R13B03: make =
check fails in
>>>>>>>>>> - 076-file-compression.t: https://gist.github.com/1893373
>>>>>>>>>> - 220-compaction-daemon.t: https://gist.github.com/1893387
>>>>>>>>>> This on runs in a VM and is 32bit, so I don't know if there's =
anything in
>>>>>>>>>> the tests that rely on 64bittyness or the R14B03. Filipe, I =
think you
>>>>>>>>>> worked on both features, do you have an idea?
>>>>>>>>>>=20
>>>>>>>>>> I tried running it all through Erlang R15B on Mac OS X 1.7.3, =
but a good
>>>>>>>>>> way into `make check` the tests would just stop and hang. The =
last time,
>>>>>>>>>> repeatedly in 160-vhosts.t, but when run alone, that test =
finished in under
>>>>>>>>>> five seconds. I'm not sure what the issue is here.
>>>>>>>>>>=20
>>>>>>>>>> Despite the things above, I'm happy to give this a +1 if we =
put a warning
>>>>>>>>>> about R15B on the download page.
>>>>>>>>>>=20
>>>>>>>>>> Great work all!
>>>>>>>>>>=20
>>>>>>>>>> Cheers
>>>>>>>>>> Jan
>>>>>>>>>> --
>>>>>>>>>>=20
>>>>>>>>>>=20
>>>>>>>>=20
>>>>>>>=20
>>>>>>=20
>>>>>=20
>>>>=20
>>>=20
>>=20
>=20