Return-Path: X-Original-To: apmail-couchdb-dev-archive@www.apache.org Delivered-To: apmail-couchdb-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 24A6C90F8 for ; Wed, 29 Feb 2012 16:24:11 +0000 (UTC) Received: (qmail 39274 invoked by uid 500); 29 Feb 2012 16:24:10 -0000 Delivered-To: apmail-couchdb-dev-archive@couchdb.apache.org Received: (qmail 39236 invoked by uid 500); 29 Feb 2012 16:24:10 -0000 Mailing-List: contact dev-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@couchdb.apache.org Delivered-To: mailing list dev@couchdb.apache.org Received: (qmail 39227 invoked by uid 99); 29 Feb 2012 16:24:10 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 29 Feb 2012 16:24:10 +0000 X-ASF-Spam-Status: No, hits=0.7 required=5.0 tests=SPF_NEUTRAL,TO_NO_BRKTS_PCNT X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: local policy) Received: from [80.244.253.218] (HELO mail.traeumt.net) (80.244.253.218) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 29 Feb 2012 16:24:02 +0000 Received: from [10.0.0.10] (91-64-198-154-dynip.superkabel.de [91.64.198.154]) (using TLSv1 with cipher AES128-SHA (128/128 bits)) (No client certificate requested) by mail.traeumt.net (Postfix) with ESMTPSA id AB2833C26C for ; Wed, 29 Feb 2012 17:23:41 +0100 (CET) Content-Type: text/plain; charset=iso-8859-1 Mime-Version: 1.0 (Apple Message framework v1257) Subject: Re: [VOTE] Apache CouchDB 1.2.0 release, second round From: Jan Lehnardt In-Reply-To: Date: Wed, 29 Feb 2012 17:23:40 +0100 Content-Transfer-Encoding: quoted-printable Message-Id: References: <7B1F0622-1915-423B-B87C-B0B7B0F1DF59@apache.org> <9369133A-E3FA-48A6-B6A4-F6ADE6486A06@dionne-associates.com> <04AC882B-F084-48DB-BF8F-39BFEC158899@apache.org> <8FDD55B6-A244-4E10-BC3F-CC42EADDA9A7@dionne-associates.com> <85846B78-84B0-47F0-8C9F-28A2326EFB29@apache.org> <5E39335F-177A-4B1B-AC8D-57500CF09092@dionne-associates.com> <8B5F61DD-C5FA-49A0-A369-91513A1A342D@apache.org> To: dev@couchdb.apache.org X-Mailer: Apple Mail (2.1257) X-Virus-Checked: Checked by ClamAV on apache.org Hi Bob, thanks for your reply. I think this is a bit of a convoluted situation = now and I want to apologise for anything that might have upset you or = anyone here. I didn't mean to be insulting or anything. I think the 1.2.0 release is very important for this community and the = project, so I am very interested in getting it out. I'm not interested = in getting it out at all costs but I do want to clarify what's needed to = do the release. In the process, your -1 turned out to trigger a hold on the 1.2.0 = release, even though you write you didn't intend that. Also in the = process I confused your -1 to ask for the hold and as a result I got = annoyed that you didn't provide more information. I also asked for = clarification on the other issues you raised (e.g. running the test = suite in Firefox/across multiple browsers) and you haven't responded to = yet (I understand time constraints, but I'd have appreciated a "gonna = work on it" sort of quick response). I'm sorry I got annoyed and let it out on you. All I want is to understand what it takes to release 1.2.0 and we need = all your (and everybody's) support to get there. So thanks for any = contributions to this discussion. Cheers Jan --=20 On Feb 29, 2012, at 16:27 , Bob Dionne wrote: > Jan, >=20 > Sorry it took me a while to respond to this. As I said the db I was = testing with is about ~200K docs, with one large ddoc, about 10 views. = It takes a considerable long time to index, both on 1.1.x and 1.2.x but = on 1.2.x it had not made the same amount of progress after 45 minutes on = 1.2.x as did the 1.1.x run after 20 minutes, so I assumed that something = has slowed, 40% maybe, that's likely a number I pulled out of my hat = from the chat and reports of others. >=20 > I understand this is hand-wavy, which is why I was reluctant to jump = to conclusions. Please note that I did not state we should stop this = release, nor did I state that any of the issues I enumerated were = showstoppers. I stated, with a -1 vote that the overall experience led = me to conclude the release was not sound. It's one vote and I always am = happy to go with the majority. >=20 > I thought we were off to a good start on Sunday when I was digging in = here. Yes it's off topic as you and Noah agreed, but once you've pushed = this out you might want to revisit the release process. This = slow_couchdb is definitely a good smoke test, I like it, but beyond that = it's sort of an instance of "You're doing it all wrong" when it comes to = performance. I'll be happy to elaborate on this when we get to that.=20 >=20 > Best Regards, >=20 > Bob >=20 >=20 > On Feb 27, 2012, at 8:15 AM, Jan Lehnardt wrote: >=20 >>=20 >> On Feb 27, 2012, at 12:58 , Bob Dionne wrote: >>=20 >>> Thanks for the clarification. I hope I'm not conflating things by = continuing the discussion here, I thought that's what you requested?=20 >>=20 >> The discussion we had on IRC was regarding collecting more data items = for the performance regression before we start to draw conclusions. >>=20 >> My intention here is to understand what needs doing before we can = release 1.2.0. >>=20 >> I'll reply inline for the other issues. >>=20 >>> I just downloaded the release candidate again to start fresh. "make = distcheck" hangs on this step: >>>=20 >>> = /Users/bitdiddle/Downloads/apache-couchdb-1.2.0/apache-couchdb-1.2.0/_buil= d/../test/etap/150-invalid-view-seq.t ......... 6/?=20 >>>=20 >>> Just stops completely. This is on R15B which has been rebuilt to use = the recommended older SSL version. I haven't looked into this crashing = too closely but I'm suspicious that I only see it with couchdb and never = with bigcouch and never using the 1.2.x branch from source or any branch = for that matter >>=20 >> =46rom the release you should run `make check`, not make distcheck. = But I assume you see a hang there too, as I have and others (yet not = everybody), too. I can't comment on BigCouch and what is different = there. It is interesting that 1.2.x won't hang. For me, `make check` in = 1.2.x on R15B hangs sometimes, in different places. I'm currently trying = to gather more information about this. >>=20 >> The question here is whether `make check` passing in R15B is a = release requirement. In my vote I considered no, but I am happy to go = with a community decision if it emerges. What is your take here? >>=20 >> In addition, this just shouldn't be a question, so we should = investigate why this happens at all and address the issue, hence = COUCHDB-1424. Any insight here would be appreciated as well. >>=20 >>=20 >>> In the command line tests, 2,7, 27, and 32 fail. but it differs from = run to run. >>=20 >> I assume you mean the JS tests. Again, this isn't supposed to work in = 1.2.x. I'm happy to backport my changes from master to 1.2.x to make = that work, but I refrained from that because I didn't want to bring too = much change to a release branch. I'm happy to reconsider, but I don't = think a release vote is a good place to discuss feature backports. >>=20 >>=20 >>> On Chrome attachment_ranges fails and it hangs on replicator_db >>=20 >> This one is an "explaining away", but I think it is warranted. Chrome = is broken for attachment_ranges. I don't know if we reported this = upstream (Robert N?), but this isn't a release blocker. For the = replicator_db test, can you try running that in other browsers. I = understand it is not the best of situation (hence the move to the cli = test suite for master), but if you get this test to pass in at least one = other browsers, this isn't a problem that holds 1.2.x. >>=20 >>=20 >>> With respect to performance I think comparisons with 1.1.x are = important. I think almost any use case, contrived or otherwise should = not be dismissed as a pathological or edge case. Bob's script is as = simple as it gets and to me is a great smoke test. We need to figure out = the reason 1.2 is clearly slower in this case. If there are specific = scenarios that 1.2.x is optimized for then we should document that and = provide reasons for the trade-offs >>=20 >> I want to make absolutely clear that I take any report of performance = regression very seriously. But I'm rather annoyed that no information = about this ends up on dev@. I understand that on IRC there's some shared = understanding of a few scenarios where performance regressions can be = shown. I asked three times now that these be posted to this mailing = list. I'm not asking for a comprehensive report, but anything really. I = found Robert Newson's simple test script on IRC and ran that to test a = suspicion of mine which I posted in an earlier mail (tiny docs -> = slower, bigger docs -> faster). Nobody else bothered to post this here. = I see no discussion about what is observed, what is expected, what would = be acceptable for a release of 1.2.0 as is and what not. >>=20 >> As far as this list is concerned, we know that a few people claimed = that things are slower and it's very real and that we should hold the = 1.2.0 release for it. I'm more than happy to hold the release until we = figured out the things I asked for above and help out figuring it all = out. But we need something to work with here. >>=20 >> I also understand that this is a voluntary project and people don't = have infinite time to spend, but at least a message of "we're collecting = things, will report when done", would be *great* to start. So far we = only have a "hold the horses, there might be a something going on". >>=20 >> Please let me know if this request is unreasonable or whether I am = overreacting. >>=20 >> Sorry for the rant. >>=20 >> To anyone who has been looking into performance regression, can you = please send to this list any info you have? If you have a comprehensive = analysis, awesome, if you just ran some script on a machine, just send = us that, let's collect all the data to get this situation solved! We = need your help. >>=20 >>=20 >> tl;dr: >>=20 >> There's three issues at hand: >>=20 >> - Robert D -1'd a release artefact. We want to understand what needs = to happen to make a release. This includes assessing the issues he = raises and squaring them against the release vote. >>=20 >> - There's a vague (as far as dev@ is concerned) report about a = performance regression. We need to get behind that. >>=20 >> - There's been a non-dev@ discussion about the performance regression = and that is referenced to influence a dev@ decision. We need that = discussion's information on dev@ to proceed. >>=20 >>=20 >> And to make it absolutely clear again. The performance regression = *is* an issue and I am very grateful for the people, including Robert = Newson, Robert Dionne and Jason Smith, who look into it. It's just that = we need to treat this as an issue and get all this info onto dev@ or = into JRIA. >>=20 >>=20 >> Cheers >> Jan >> --=20 >>=20 >>=20 >>=20 >>>=20 >>> Cheers, >>>=20 >>> Bob >>>=20 >>>=20 >>> On Feb 26, 2012, at 4:07 PM, Jan Lehnardt wrote: >>>=20 >>>> Bob, >>>>=20 >>>> thanks for your reply >>>>=20 >>>> I wasn't implying we should try to explain anything away. All of = these are valid concerns, I just wanted to get a better understanding on = where the bit flips from +0 to -1 and subsequently, how to address that = boundary. >>>=20 >>>=20 >>>=20 >>>=20 >>>> Ideally we can just fix all of the things you mention, but I think = it is important to understand them in detail, that's why I was going = into them. Ultimately, I want to understand what we need to do to ship = 1.2.0. >>>>=20 >>>> On Feb 26, 2012, at 21:22 , Bob Dionne wrote: >>>>=20 >>>>> Jan, >>>>>=20 >>>>> I'm -1 based on all of my evaluation. I've spent a few hours on = this release now yesterday and today. It doesn't really pass what I = would call the "smoke test". Almost everything I've run into has an = explanation: >>>>>=20 >>>>> 1. crashes out of the box - that's R15B, you need to recompile SSL = and Erlang (we'll note on release notes) >>>>=20 >>>> Have we spent any time on figuring out what the trouble here is? >>>>=20 >>>>=20 >>>>> 2. etaps hang running make check. Known issue. Our etap code is = out of date, recent versions of etap don't even run their own unit tests >>>>=20 >>>> I have seen the etap hang as well, and I wasn't diligent enough to = report it in JIRA, I have done so now (COUCHDB-1424). >>>>=20 >>>>=20 >>>>> 3. Futon tests fail. Some are known bugs (attachment ranges in = Chrome) . Both Chrome and Safari also hang >>>>=20 >>>> Do you have more details on where Chrome and Safari hang? Can you = try their private browsing features, double/triple check that caches are = empty? Can you get to a situation where you get all tests succeeding = across all browsers, even if individual ones fail on one or two others? >>>>=20 >>>>=20 >>>>> 4. standalone JS tests fail. Again most of these run when run by = themselves >>>>=20 >>>> Which ones? >>>>=20 >>>>=20 >>>>> 5. performance. I used real production data *because* Stefan on = user reported performance degradation on his data set. Any numbers are = meaningless for a single test. I also ran scripts that BobN and Jason = Smith posted that show a difference between 1.1.x and 1.2.x >>>>=20 >>>> You are conflating an IRC discussion we've had into this thread. = The performance regression reported is a good reason to look into other = scenarios where we can show slowdowns. But we need to understand what's = happening. Just from looking at dev@ all I see is some handwaving about = some reports some people have done (Not to discourage any work that has = been done on IRC and user@, but for the sake of a release vote thread, = this related information needs to be on this mailing list). >>>>=20 >>>> As I said on IRC, I'm happy to get my hands dirty to understand the = regression at hand. But we need to know where we'd draw a line and say = this isn't acceptable for a 1.2.0. >>>>=20 >>>>=20 >>>>> 6. Reviewed patch pointed to by Jason that may be the cause but = it's hard to say without knowing the code analysis that went into the = changes. You can see obvious local optimizations that make good sense = but those are often the ones that get you, without knowing the call = counts. >>>>=20 >>>> That is a point that wasn't included in your previous mail. It's = great that there is progress, thanks for looking into this! >>>>=20 >>>>=20 >>>>> Many of these issues can be explained away, but I think end users = will be less forgiving. I think we already struggle with view = performance. I'm interested to see how others evaluate this regression. >>>>> I'll try this seatoncouch tool you mention later to see if I can = construct some more definitive tests. >>>>=20 >>>> Again, I'm not trying to explain anything away. I want to get a = shared understanding of the issues you raised and where we stand on = solving them squared against the ongoing 1.2.0 release. >>>>=20 >>>> And again: Thanks for doing this thorough review and looking into = performance issue. I hope with your help we can understand all these = things a lot better very soon :) >>>>=20 >>>> Cheers >>>> Jan >>>> --=20 >>>>=20 >>>>=20 >>>>>=20 >>>>> Best, >>>>>=20 >>>>> Bob >>>>> On Feb 26, 2012, at 2:29 PM, Jan Lehnardt wrote: >>>>>=20 >>>>>>=20 >>>>>> On Feb 26, 2012, at 13:58 , Bob Dionne wrote: >>>>>>=20 >>>>>>> -1 >>>>>>>=20 >>>>>>> R15B on OS X Lion >>>>>>>=20 >>>>>>> I rebuilt OTP with an older SSL and that gets past all the = crashes (thanks Filipe). I still see hangs when running make check, = though any particular etap that hangs will run ok by itself. The Futon = tests never run to completion in Chrome without hanging and the = standalone JS tests also have fails. >>>>>>=20 >>>>>> What part of this do you consider the -1? Can you try running the = JS tests in Firefox and or Safari? Can you get all tests pass at least = once across all browsers? The cli JS suite isn't supposed to work, so = that isn't a criterion. I've seen the hang in make check for R15B while = individual tests run as well, but I don't consider this blocking. While = I understand and support the notion that tests shouldn't fail, period, = we gotta work with what we have and master already has significant = improvements. What would you like to see changed to not -1 this release? >>>>>>=20 >>>>>>> I tested the performance of view indexing, using a modest 200K = doc db with a large complex view and there's a clear regression between = 1.1.x and 1.2.x Others report similar results >>>>>>=20 >>>>>> What is a large complex view? The complexity of the map/reduce = functions is rarely an indicator of performance, it's usually input doc = size and output/emit()/reduce data size. How big are the docs in your = test and how big is the returned data? I understand the changes for = 1.2.x will improve larger-data scenarios more significantly. >>>>>>=20 >>>>>> Cheers >>>>>> Jan >>>>>> --=20 >>>>>>=20 >>>>>>=20 >>>>>>=20 >>>>>>=20 >>>>>>>=20 >>>>>>> On Feb 23, 2012, at 5:25 PM, Bob Dionne wrote: >>>>>>>=20 >>>>>>>> sorry Noah, I'm in debug mode today so I don't care to start = mucking with my stack, recompiling erlang, etc... >>>>>>>>=20 >>>>>>>> I did try using that build repeatedly and it crashes all the = time. I find it very odd and I had seen those before as I said on my = older macbook.=20 >>>>>>>>=20 >>>>>>>> I do see the hangs Jan describes in the etaps, they have been = there right along, so I'm confident this just the SSL issue. Why it only = happens on the build is puzzling, any source build of any branch works = just peachy. >>>>>>>>=20 >>>>>>>> So I'd say I'm +1 based on my use of the 1.2.x branch but I'd = like to hear from Stefan, who reported the severe performance = regression. BobN seems to think we can ignore that, it's something flaky = in that fellow's environment. I tend to agree but I'm conservative >>>>>>>>=20 >>>>>>>> On Feb 23, 2012, at 1:23 PM, Noah Slater wrote: >>>>>>>>=20 >>>>>>>>> Can someone convince me this bus error stuff and segfaults is = not a >>>>>>>>> blocking issue. >>>>>>>>>=20 >>>>>>>>> Bob tells me that he's followed the steps above and he's still = experiencing >>>>>>>>> the issues. >>>>>>>>>=20 >>>>>>>>> Bob, you did follow the steps to install your own SSL right? >>>>>>>>>=20 >>>>>>>>> On Thu, Feb 23, 2012 at 5:09 PM, Jan Lehnardt = wrote: >>>>>>>>>=20 >>>>>>>>>>=20 >>>>>>>>>> On Feb 23, 2012, at 00:28 , Noah Slater wrote: >>>>>>>>>>=20 >>>>>>>>>>> Hello, >>>>>>>>>>>=20 >>>>>>>>>>> I would like call a vote for the Apache CouchDB 1.2.0 = release, second >>>>>>>>>> round. >>>>>>>>>>>=20 >>>>>>>>>>> We encourage the whole community to download and test these >>>>>>>>>>> release artifacts so that any critical issues can be = resolved before the >>>>>>>>>>> release is made. Everyone is free to vote on this release, = so get stuck >>>>>>>>>> in! >>>>>>>>>>>=20 >>>>>>>>>>> We are voting on the following release artifacts: >>>>>>>>>>>=20 >>>>>>>>>>> http://people.apache.org/~nslater/dist/1.2.0/ >>>>>>>>>>>=20 >>>>>>>>>>>=20 >>>>>>>>>>> These artifacts have been built from the following tree-ish = in Git: >>>>>>>>>>>=20 >>>>>>>>>>> 4cd60f3d1683a3445c3248f48ae064fb573db2a1 >>>>>>>>>>>=20 >>>>>>>>>>>=20 >>>>>>>>>>> Please follow the test procedure before voting: >>>>>>>>>>>=20 >>>>>>>>>>> http://wiki.apache.org/couchdb/Test_procedure >>>>>>>>>>>=20 >>>>>>>>>>>=20 >>>>>>>>>>> Thank you. >>>>>>>>>>>=20 >>>>>>>>>>> Happy voting, >>>>>>>>>>=20 >>>>>>>>>> Signature and hashes check out. >>>>>>>>>>=20 >>>>>>>>>> Mac OS X 10.7.3, 64bit, SpiderMonkey 1.8.0, Erlang R14B04: = make check >>>>>>>>>> works fine, browser tests in Safari work fine. >>>>>>>>>>=20 >>>>>>>>>> Mac OS X 10.7.3, 64bit, SpiderMonkey 1.8.5, Erlang R14B04: = make check >>>>>>>>>> works fine, browser tests in Safari work fine. >>>>>>>>>>=20 >>>>>>>>>> FreeBSD 9.0, 64bit, SpiderMonkey 1.7.0, Erlang R14B04: make = check works >>>>>>>>>> fine, browser tests in Safari work fine. >>>>>>>>>>=20 >>>>>>>>>> CentOS 6.2, 64bit, SpiderMonkey 1.8.5, Erlang R14B04: make = check works >>>>>>>>>> fine, browser tests in Firefox work fine. >>>>>>>>>>=20 >>>>>>>>>> Ubuntu 11.4, 64bit, SpiderMonkey 1.8.5, Erlang R14B02: make = check works >>>>>>>>>> fine, browser tests in Firefox work fine. >>>>>>>>>>=20 >>>>>>>>>> Ubuntu 10.4, 32bit, SpiderMonkey 1.8.0, Erlang R13B03: make = check fails in >>>>>>>>>> - 076-file-compression.t: https://gist.github.com/1893373 >>>>>>>>>> - 220-compaction-daemon.t: https://gist.github.com/1893387 >>>>>>>>>> This on runs in a VM and is 32bit, so I don't know if there's = anything in >>>>>>>>>> the tests that rely on 64bittyness or the R14B03. Filipe, I = think you >>>>>>>>>> worked on both features, do you have an idea? >>>>>>>>>>=20 >>>>>>>>>> I tried running it all through Erlang R15B on Mac OS X 1.7.3, = but a good >>>>>>>>>> way into `make check` the tests would just stop and hang. The = last time, >>>>>>>>>> repeatedly in 160-vhosts.t, but when run alone, that test = finished in under >>>>>>>>>> five seconds. I'm not sure what the issue is here. >>>>>>>>>>=20 >>>>>>>>>> Despite the things above, I'm happy to give this a +1 if we = put a warning >>>>>>>>>> about R15B on the download page. >>>>>>>>>>=20 >>>>>>>>>> Great work all! >>>>>>>>>>=20 >>>>>>>>>> Cheers >>>>>>>>>> Jan >>>>>>>>>> -- >>>>>>>>>>=20 >>>>>>>>>>=20 >>>>>>>>=20 >>>>>>>=20 >>>>>>=20 >>>>>=20 >>>>=20 >>>=20 >>=20 >=20