reef-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Markus Weimer <mar...@weimo.de>
Subject Re: [VOTE] Release Apache REEF 0.16.0 (rc1)
Date Fri, 11 Aug 2017 23:05:30 GMT
Awesome! Thanks @Taegeon for driving! -- Markus

On Thu, Aug 10, 2017 at 2:57 AM, Julia Wang (QIUHE)
<Qiuhe.Wang@microsoft.com.invalid> wrote:
> Updated.
>
> Thanks,
> Julia
>
> -----Original Message-----
> From: Taegeon Um [mailto:taegeonum@gmail.com]
> Sent: Wednesday, August 9, 2017 10:48 PM
> To: dev@reef.apache.org
> Subject: Re: [VOTE] Release Apache REEF 0.16.0 (rc1)
>
> Also, it would be great if you update REEF-1850 to track this issue.
>
> Thanks,
> Taegeon
>
>> On Aug 10, 2017, at 2:16 PM, Julia Wang (QIUHE) <Qiuhe.Wang@microsoft.com.INVALID>
wrote:
>>
>> In the past few days, I picked up multiple commits from Jan to May and run 1000 node
testing. It is really hard to find when the issue was introduced because the issue is transient.
Previous I run 0.16 build with 1000 nodes and 800 nodes, they always fail. Now when I run
it again with 0.16 release build, it passed a few times.
>>
>> Given the fact that the root cause of transient issue is hard to identify and there
is no short term fix, and we haven't released for very long time. I would change my vote to
0 and let it go. We will come back to debug whenever we have some capacity.
>>
>> Thanks,
>> Julia
>>
>> -----Original Message-----
>> From: Byung-Gon Chun [mailto:bgchun@gmail.com]
>> Sent: Wednesday, August 9, 2017 9:36 PM
>> To: dev@reef.apache.org
>> Subject: Re: [VOTE] Release Apache REEF 0.16.0 (rc1)
>>
>> Thanks for testing, Julia!
>>
>> I have to submit a board report soon. It'd be very nice if we have a
>> conclusion on this release vote. :)
>>
>> Thanks.
>> -Gon
>>
>> On Wed, Aug 9, 2017 at 3:28 AM, Julia Wang (QIUHE) < Qiuhe.Wang@microsoft.com.invalid>
wrote:
>>
>>> I am doing some more testing to try to narrow down the cause of the
>>> failure. I will update the thread soon after I have more findings.
>>>
>>> Julia
>>>
>>> Sent from my iPhone
>>>
>>>> On Aug 7, 2017, at 3:44 PM, Taegeon Um <taegeonum@gmail.com> wrote:
>>>>
>>>> Hi Julia,
>>>>
>>>> Are you still -1 on releasing 0.16?
>>>>
>>>> Thanks,
>>>> Taegeon
>>>>
>>>>
>>>> 2017. 8. 5. 오후 3:23에 "Julia Wang (QIUHE)" <Qiuhe.Wang@microsoft.com.
>>> invalid>님이
>>>> 작성:
>>>>
>>>> HI Gon,
>>>>
>>>> Agree, it is hard to resolve. But I am pretty sure it was introduced
>>>> in
>>> the
>>>> first few month of this year. It was working last year ☹
>>>>
>>>> Julia
>>>>
>>>> -----Original Message-----
>>>> From: Byung-Gon Chun [mailto:bgchun@gmail.com]
>>>> Sent: Friday, August 4, 2017 10:53 PM
>>>> To: dev@reef.apache.org
>>>> Subject: Re: [VOTE] Release Apache REEF 0.16.0 (rc1)
>>>>
>>>> Julia, thanks for running the tests!
>>>> Scalability bugs are hard to debug. It's not likely that we will
>>>> resolve them quickly.
>>>>
>>>> I definitely vote for option 2. Once we fix the bugs, perhaps we can
>>>> do a minor version update release.
>>>>
>>>>
>>>>> On Fri, Aug 4, 2017 at 11:34 PM, Gyewon Lee <strayyyyyy@gmail.com>
>>> wrote:
>>>>>
>>>>> I agree with Taegeon. +1 for option 2.
>>>>>
>>>>> 2017-08-05 11:41 GMT+09:00 Taegeon Um <taegeonum@gmail.com>:
>>>>>
>>>>>> Thanks Julia for sharimg the issue!
>>>>>>
>>>>>> 2017. 8. 5. 오전 10:56에 "Julia Wang (QIUHE)" <Qiuhe.Wang@microsoft.com.
>>>>>> invalid>님이
>>>>>> 작성:
>>>>>>
>>>>>> TestUnhandledTaskExceptionDoesntCrashEvaluator passed in my env.
>>>>>>
>>>>>> I would think both TestUnhandledTaskExceptionDoesntCrashEvaluator
>>>>>> and TestRuntimeNameSpecifyingValidName are transient failures. We
>>>>>> can log
>>>>> Jira
>>>>>> for following up similar as what we have agreed on other test
>>>>>> transient failures.
>>>>>>
>>>>>>
>>>>>> +1 for followig up them as transient failures.
>>>>>>
>>>>>>
>>>>>> Today I tested IMRU example on Yarn.
>>>>>> With 500 nodes, test pass. I run multiple times, they all pass.
>>>>>> With 1000 nodes, test fails. Received 1000 completed tasks but
>>>>>> only
>>>>>> 998 completed evaluators. Drive doesn’t shut down until I kill
it.
>>>>>>
>>>>>> With 800 nodes, test fails. Received 800 completed tasks but only
>>>>>> 799 completed evaluators. Drive doesn’t shut down until I kill
it.
>>>>>>
>>>>>> Options1: Find root cause and fix the issue before 0.16 release.
>>>>>> From the logs, there is no error. Looks like finding the root
>>>>>> cause is not trivial job.  We had similar issue last year, it took
>>>>>> big effort for Mariia and
>>>>> me
>>>>>> to identify the issue.
>>>>>> Options 2: Log JIRA and follow up later.
>>>>>>
>>>>>>
>>>>>> +1 for option 2.
>>>>>>
>>>>>> I think it would take a long time to investigate root cause, so
>>>>>> I'm
>>>>> worried
>>>>>> that 0.16 release will be delayed for a long time again.
>>>>>>
>>>>>> How about releasing 0.16 with the known issues, and fixing them in
>>> 0.17?
>>>>> Or
>>>>>> if we resolve the issues quickly, we could do a minor release
>>>>>> (e.g.,
>>>>>> 0.16.1) ?
>>>>>>
>>>>>> Taegeon
>>>>>>
>>>>>>
>>>>>> Julia
>>>>>>
>>>>>>
>>>>>> From: Kim Doyoung [mailto:disoxc21@gmail.com]
>>>>>> Sent: Friday, August 4, 2017 2:50 AM
>>>>>> To: dev@reef.apache.org
>>>>>> Subject: RE: [VOTE] Release Apache REEF 0.16.0 (rc1)
>>>>>>
>>>>>> Hi
>>>>>>
>>>>>> I ran test on Java and .Net.
>>>>>> My environment is Windows 10 x64 ver 1703 with java 1.8.0_141
>>>>>>
>>>>>> Java built and passed tests well by command ‘mvn clean install’
>>>>>>
>>>>>> But .Net side has failed to pass a test with Visual Studio 2015.
>>>>>> It’s not a same test failure with Julia.
>>>>>>
>>>>>> TestUnhandledTaskExceptionDoesntCrashEvaluator
>>>>>> [cid:image001.png@01D30D51.E782D6E0]
>>>>>>
>>>>>> `TestRuntimeNameSpecifyingValidName` Test was passed.
>>>>>>
>>>>>> Thank you.
>>>>>>
>>>>>> Doyoung
>>>>>>
>>>>>> 보낸 사람: Taegeon Um<mailto:taegeonum@gmail.com> 보낸
날짜: 2017년 8월 4일
>>>>>> 금요일 오후 3:31 받는 사람: dev@reef.apache.org<mailto:dev@reef.apache.org>
>>>>>> 제목: Re: [VOTE] Release Apache REEF 0.16.0 (rc1)
>>>>>>
>>>>>> Thanks Julia for sharing the result!
>>>>>>
>>>>>> Is there someone who experienced the same test failure in .Net?
>>>>>>
>>>>>> Taegeon
>>>>>>
>>>>>>> On Aug 4, 2017, at 3:19 PM, Julia Wang (QIUHE) <
>>>>> Qiuhe.Wang@microsoft.com
>>>>>> .
>>>>>> INVALID<mailto:Qiuhe.Wang@microsoft.com.INVALID>> wrote:
>>>>>>>
>>>>>>> -Test HelloREEF from .Net to Java on YARN is successful.
>>>>>>> - mvn clean install pass on Windows Server 2002 R2 -.Net tests
in
>>>>>>> VS 2005 all passed with yarn test filtered, except one
>>>>>> test failure
>>>>>>>     TestRuntimeNameSpecifyingValidName
>>>>>>> The test error is cannot read log file, same as the transient
>>>>>>> errors in
>>>>>> other tests. However, I run 3 times with this test only, all fail
>>>>>> in my local box.
>>>>>>>
>>>>>>> Can someone run .Net tests in your box to see if it can repro?
>>>>>>>
>>>>>>> Julia
>>>>>>>
>>>>>>> -----Original Message-----
>>>>>>> From: Julia Wang (QIUHE)
>>>>>>> [mailto:Qiuhe.Wang@microsoft.com.INVALID]
>>>>>>> Sent: Thursday, August 3, 2017 7:23 PM
>>>>>>> To: dev@reef.apache.org<mailto:dev@reef.apache.org>
>>>>>>> Subject: RE: [VOTE] Release Apache REEF 0.16.0 (rc1)
>>>>>>>
>>>>>>> Right, we need not only vote counts but test coverage. I will
>>>>>>> test on
>>>>>> YARN.
>>>>>>>
>>>>>>> Julia
>>>>>>>
>>>>>>> -----Original Message-----
>>>>>>> From: Taegeon Um [mailto:taegeonum@gmail.com]
>>>>>>> Sent: Thursday, August 3, 2017 6:50 PM
>>>>>>> To: dev@reef.apache.org<mailto:dev@reef.apache.org>
>>>>>>> Subject: Re: [VOTE] Release Apache REEF 0.16.0 (rc1)
>>>>>>>
>>>>>>> Hi all,
>>>>>>>
>>>>>>> Thanks for running tests on various environments!
>>>>>>>
>>>>>>> We already have 6 +1 :), but it would be more great if someone
>>>>>>> runs the
>>>>>> test on HDI and Yarn.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Taegeon
>>>>>>>
>>>>>>>> On Aug 4, 2017, at 3:07 AM, Sergiy Matusevych <
>>>>>> sergiy.matusevych@gmail.com<mailto:sergiy.matusevych@gmail.com>>
>>> wrote:
>>>>>>>>
>>>>>>>> Here's what I have:
>>>>>>>>
>>>>>>>> Environment 1:
>>>>>>>> * Windows 10 Pro 1703 build 15063.483
>>>>>>>> * Java HotSpot(TM) 64-Bit Server VM (build 25.144-b01, mixed
>>>>>>>> mode)
>>>>>>>> * Visual Studio 2017 Enterprise version 15.2 (26430.16)
>>>>>>>> * Microsoft .NET Framework version 4.7.02046
>>>>>>>>
>>>>>>>> mvn clean install
>>>>>>>> all tests pass
>>>>>>>>
>>>>>>>> Visual Studio the following tests fail:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> (all apparently related to YARN environment)
>>>>>>>>
>>>>>>>> Environment 2:
>>>>>>>> * Ubuntu Linux 17.04
>>>>>>>> * OpenJDK 64-Bit Server VM (build 25.131-b11, mixed mode)
>>>>>>>>
>>>>>>>> mvn clean install
>>>>>>>> all tests pass
>>>>>>>>
>>>>>>>>
>>>>>>>> My vote is +1
>>>>>>>>
>>>>>>>> Great job everyone!
>>>>>>>>
>>>>>>>> Cheers,
>>>>>>>> Sergiy.
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, Aug 1, 2017 at 10:44 PM, Taegeon Um <taegeonum@gmail.com
>>>>>> <mailto:
>>>>>> taegeonum@gmail.com<mailto:taegeonum@gmail.com%20%
>>>>>> 3cmailto:taegeonum@gmail.com>>> wrote:
>>>>>>>> This is to call for a new vote for the source release of
Apache
>>>>>>>> REEF
>>>>>> 0.16.0 (rc1).
>>>>>>>>
>>>>>>>> The source tar ball, including signatures, digests, etc can
be
>>>>>>>> found
>>>>> at:
>>>>>>>> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2
>>>>>>>> F
>>>>>>>> dist
>>>>> .
>>>>>>>> apache.org%2Frepos%2Fdist%2Fdev%2Freef%2F0.16.0-rc1%2F&
>>>>> data=02%7C01%7C
>>>>>>>> Qiuhe.Wang%40microsoft.com%7C962355591ab34abfc1c908d4dadb
>>>>> 20a8%7C72f988
>>>>>>>> bf86f141af91ab2d7cd011db47%7C1%7C0%7C636374082064906216&
>>>>> sdata=xjCvVqLO
>>>>>>>> iymatXvxN0glp9Ty9xip1fcpMint0HYMrkA%3D&reserved=0
>>>>>>>> <https://na01.safelinks.protection.outlook.com/?url=
>>>>> https%3A%2F%2Fdist
>>>>>>>> .apache.org%2Frepos%2Fdist%2Fdev%2Freef%2F0.16.0-rc1%2F&
>>>>> data=02%7C01%7
>>>>>>>> CQiuhe.Wang%40microsoft.com%7C962355591ab34abfc1c908d4dadb
>>>>> 20a8%7C72f98
>>>>>>>> 8bf86f141af91ab2d7cd011db47%7C1%7C0%7C636374082064906216&
>>>>> sdata=xjCvVqL
>>>>>>>> OiymatXvxN0glp9Ty9xip1fcpMint0HYMrkA%3D&reserved=0>
>>>>>>>> <https://na01.safelinks.protection.outlook.com/?url=
>>>>> https%3A%2F%2Fdist
>>>>>>>> .apache.org%2Frepos%2Fdist%2Fdev%2Freef%2F0.16.0-rc1%2F&
>>>>> data=02%7C01%7
>>>>>>>> CQiuhe.Wang%40microsoft.com%7C962355591ab34abfc1c908d4dadb
>>>>> 20a8%7C72f98
>>>>>>>> 8bf86f141af91ab2d7cd011db47%7C1%7C0%7C636374082064906216&
>>>>> sdata=xjCvVqL
>>>>>>>> OiymatXvxN0glp9Ty9xip1fcpMint0HYMrkA%3D&reserved=0
>>>>>>>> <https://na01.safelinks.protection.outlook.com/?url=
>>>>> https%3A%2F%2Fdist
>>>>>>>> .apache.org%2Frepos%2Fdist%2Fdev%2Freef%2F0.16.0-rc1%2F&
>>>>> data=02%7C01%7
>>>>>>>> CQiuhe.Wang%40microsoft.com%7C962355591ab34abfc1c908d4dadb
>>>>> 20a8%7C72f98
>>>>>>>> 8bf86f141af91ab2d7cd011db47%7C1%7C0%7C636374082064906216&
>>>>> sdata=xjCvVqL
>>>>>>>> OiymatXvxN0glp9Ty9xip1fcpMint0HYMrkA%3D&reserved=0>>
>>>>>>>>
>>>>>>>> The Git tag is release-0.16.0-rc1 The Git commit ID is
>>>>>>>> 85cc0a090ab48cf27acce2128c64b07b197d92e5
>>>>>>>>
>>>>>>>> Checksums of apache-reef-0.16.0-rc1.tar.gz:
>>>>>>>>
>>>>>>>> MD5: 155673fe44f95be9362b9075865c8cad
>>>>>>>> SHA:
>>>>>>>> d62c58df1f4ba962a51d81579d27321f75dad98c3c3def9bc8fb24ebf1e2
>>>>> 7978029d7d
>>>>>>>> dedf26bf4ee9434cb8e6d0e4f6e1a9a4d240d03daccd9ef66bdc403f1b
>>>>>>>>
>>>>>>>> Release artifacts are signed with a key found in the KEYS
file
>>>>> available
>>>>>> here:
>>>>>>>>
>>>>>>>> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2
>>>>>>>> F
>>>>>>>> dist
>>>>> .
>>>>>>>> apache.org%2Frepos%2Fdist%2Frelease%2Freef%2FKEYS&data=
>>>>> 02%7C01%7CQiuhe
>>>>>>>> .Wang%40microsoft.com%7C962355591ab34abfc1c908d4dadb
>>>>> 20a8%7C72f988bf86f
>>>>>>>> 141af91ab2d7cd011db47%7C1%7C0%7C636374082064906216&sdata=
>>>>> ozT2s3kOfzDgT
>>>>>>>> SviedxkyoOz18bkUElStigHmh1Fzmo%3D&reserved=0
>>>>>>>> <https://na01.safelinks.protection.outlook.com/?url=
>>>>> https%3A%2F%2Fdist
>>>>>>>> .apache.org%2Frepos%2Fdist%2Frelease%2Freef%2FKEYS&data=
>>>>> 02%7C01%7CQiuh
>>>>>>>> e.Wang%40microsoft.com%7C962355591ab34abfc1c908d4dadb
>>>>> 20a8%7C72f988bf86
>>>>>>>> f141af91ab2d7cd011db47%7C1%7C0%7C636374082064906216&
>>>>> sdata=ozT2s3kOfzDg
>>>>>>>> TSviedxkyoOz18bkUElStigHmh1Fzmo%3D&reserved=0>
>>>>>>>> <https://na01.safelinks.protection.outlook.com/?url=
>>>>> https%3A%2F%2Fdist
>>>>>>>> .apache.org%2Frepos%2Fdist%2Frelease%2Freef%2FKEYS&data=
>>>>> 02%7C01%7CQiuh
>>>>>>>> e.Wang%40microsoft.com%7C962355591ab34abfc1c908d4dadb
>>>>> 20a8%7C72f988bf86
>>>>>>>> f141af91ab2d7cd011db47%7C1%7C0%7C636374082064906216&
>>>>> sdata=ozT2s3kOfzDg
>>>>>>>> TSviedxkyoOz18bkUElStigHmh1Fzmo%3D&reserved=0
>>>>>>>> <https://na01.safelinks.protection.outlook.com/?url=
>>>>> https%3A%2F%2Fdist
>>>>>>>> .apache.org%2Frepos%2Fdist%2Frelease%2Freef%2FKEYS&data=
>>>>> 02%7C01%7CQiuh
>>>>>>>> e.Wang%40microsoft.com%7C962355591ab34abfc1c908d4dadb
>>>>> 20a8%7C72f988bf86
>>>>>>>> f141af91ab2d7cd011db47%7C1%7C0%7C636374082064906216&
>>>>> sdata=ozT2s3kOfzDg
>>>>>>>> TSviedxkyoOz18bkUElStigHmh1Fzmo%3D&reserved=0>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Issues resolved in this release:
>>>>>>>> https://na01.safelinks.protection.outlook.com/?url=
>>>>> https%3A%2F%2Fissue
>>>>>>>> s.apache.org%2Fjira%2Fsecure%2FReleaseNote.jspa%
>>>>> 3FprojectId%3D12315820
>>>>>>>> %26version%3D12335833&data=02%7C01%7CQiuhe.Wang%40microsoft.com
>>>>> %7C9623
>>>>>>>> 55591ab34abfc1c908d4dadb20a8%7C72f988bf86f141af91ab2d7cd011
>>>>> db47%7C1%7C
>>>>>>>> 0%7C636374082064906216&sdata=ojQ%2FJYKmfIsCHdTPCkO%
>>>>> 2BmWjYs3BLlxzcUKpV4
>>>>>>>> CBUaSg%3D&reserved=0
>>>>>>>> <https://na01.safelinks.protection.outlook.com/?url=
>>>>> https%3A%2F%2Fissu
>>>>>>>> es.apache.org%2Fjira%2Fsecure%2FReleaseNote.jspa%
>>>>> 3FprojectId%3D1231582
>>>>>>>> 0%26version%3D12335833&data=02%7C01%7CQiuhe.Wang%40microsoft.com
>>>>> %7C962
>>>>>>>> 355591ab34abfc1c908d4dadb20a8%7C72f988bf86f141af91ab2d7cd011
>>>>> db47%7C1%7
>>>>>>>> C0%7C636374082064906216&sdata=ojQ%2FJYKmfIsCHdTPCkO%
>>>>> 2BmWjYs3BLlxzcUKpV
>>>>>>>> 4CBUaSg%3D&reserved=0>
>>>>>>>> <https://na01.safelinks.protection.outlook.com/?url=
>>>>> https%3A%2F%2Fissu
>>>>>>>> es.apache.org%2Fjira%2Fsecure%2FReleaseNote.jspa%
>>>>> 3FprojectId%3D1231582
>>>>>>>> 0%26version%3D12335833&data=02%7C01%7CQiuhe.Wang%40microsoft.com
>>>>> %7C962
>>>>>>>> 355591ab34abfc1c908d4dadb20a8%7C72f988bf86f141af91ab2d7cd011
>>>>> db47%7C1%7
>>>>>>>> C0%7C636374082064906216&sdata=ojQ%2FJYKmfIsCHdTPCkO%
>>>>> 2BmWjYs3BLlxzcUKpV
>>>>>>>> 4CBUaSg%3D&reserved=0
>>>>>>>> <https://na01.safelinks.protection.outlook.com/?url=
>>>>> https%3A%2F%2Fissu
>>>>>>>> es.apache.org%2Fjira%2Fsecure%2FReleaseNote.jspa%
>>>>> 3FprojectId%3D1231582
>>>>>>>> 0%26version%3D12335833&data=02%7C01%7CQiuhe.Wang%40microsoft.com
>>>>> %7C962
>>>>>>>> 355591ab34abfc1c908d4dadb20a8%7C72f988bf86f141af91ab2d7cd011
>>>>> db47%7C1%7
>>>>>>>> C0%7C636374082064906216&sdata=ojQ%2FJYKmfIsCHdTPCkO%
>>>>> 2BmWjYs3BLlxzcUKpV
>>>>>>>> 4CBUaSg%3D&reserved=0>>
>>>>>>>>
>>>>>>>>
>>>>>>>> The vote will be open for 72 hours. Please download the release
>>>>>>>> candidate, check the hashes/signature, build it and test
it, and
>>>>>>>> then please vote:
>>>>>>>>
>>>>>>>> [ ] +1 Release this package as Apache REEF 0.16.0 [ ] +0
no
>>>>>>>> opinion [ ] -1 Do not release this package because ...
>>>>>>>>
>>>>>>>> Thanks!
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Byung-Gon Chun
>>>
>>
>>
>>
>> --
>> Byung-Gon Chun
>

Mime
View raw message