reef-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Taegeon Um <taegeo...@gmail.com>
Subject Re: [VOTE] Release Apache REEF 0.16.0 (rc1)
Date Thu, 10 Aug 2017 05:30:38 GMT
Thanks a lot for testing, Julia! 

I will conclude this vote.
Thanks everyone who vote to release 0.16! 

Taegeon


> On Aug 10, 2017, at 2:16 PM, Julia Wang (QIUHE) <Qiuhe.Wang@microsoft.com.INVALID>
wrote:
> 
> In the past few days, I picked up multiple commits from Jan to May and run 1000 node
testing. It is really hard to find when the issue was introduced because the issue is transient.
Previous I run 0.16 build with 1000 nodes and 800 nodes, they always fail. Now when I run
it again with 0.16 release build, it passed a few times. 
> 
> Given the fact that the root cause of transient issue is hard to identify and there is
no short term fix, and we haven't released for very long time. I would change my vote to 0
and let it go. We will come back to debug whenever we have some capacity. 
> 
> Thanks,
> Julia
> 
> -----Original Message-----
> From: Byung-Gon Chun [mailto:bgchun@gmail.com] 
> Sent: Wednesday, August 9, 2017 9:36 PM
> To: dev@reef.apache.org
> Subject: Re: [VOTE] Release Apache REEF 0.16.0 (rc1)
> 
> Thanks for testing, Julia!
> 
> I have to submit a board report soon. It'd be very nice if we have a conclusion on this
release vote. :)
> 
> Thanks.
> -Gon
> 
> On Wed, Aug 9, 2017 at 3:28 AM, Julia Wang (QIUHE) < Qiuhe.Wang@microsoft.com.invalid>
wrote:
> 
>> I am doing some more testing to try to narrow down the cause of the 
>> failure. I will update the thread soon after I have more findings.
>> 
>> Julia
>> 
>> Sent from my iPhone
>> 
>>> On Aug 7, 2017, at 3:44 PM, Taegeon Um <taegeonum@gmail.com> wrote:
>>> 
>>> Hi Julia,
>>> 
>>> Are you still -1 on releasing 0.16?
>>> 
>>> Thanks,
>>> Taegeon
>>> 
>>> 
>>> 2017. 8. 5. 오후 3:23에 "Julia Wang (QIUHE)" <Qiuhe.Wang@microsoft.com.
>> invalid>님이
>>> 작성:
>>> 
>>> HI Gon,
>>> 
>>> Agree, it is hard to resolve. But I am pretty sure it was introduced 
>>> in
>> the
>>> first few month of this year. It was working last year ☹
>>> 
>>> Julia
>>> 
>>> -----Original Message-----
>>> From: Byung-Gon Chun [mailto:bgchun@gmail.com]
>>> Sent: Friday, August 4, 2017 10:53 PM
>>> To: dev@reef.apache.org
>>> Subject: Re: [VOTE] Release Apache REEF 0.16.0 (rc1)
>>> 
>>> Julia, thanks for running the tests!
>>> Scalability bugs are hard to debug. It's not likely that we will 
>>> resolve them quickly.
>>> 
>>> I definitely vote for option 2. Once we fix the bugs, perhaps we can 
>>> do a minor version update release.
>>> 
>>> 
>>>> On Fri, Aug 4, 2017 at 11:34 PM, Gyewon Lee <strayyyyyy@gmail.com>
>> wrote:
>>>> 
>>>> I agree with Taegeon. +1 for option 2.
>>>> 
>>>> 2017-08-05 11:41 GMT+09:00 Taegeon Um <taegeonum@gmail.com>:
>>>> 
>>>>> Thanks Julia for sharimg the issue!
>>>>> 
>>>>> 2017. 8. 5. 오전 10:56에 "Julia Wang (QIUHE)" <Qiuhe.Wang@microsoft.com.
>>>>> invalid>님이
>>>>> 작성:
>>>>> 
>>>>> TestUnhandledTaskExceptionDoesntCrashEvaluator passed in my env.
>>>>> 
>>>>> I would think both TestUnhandledTaskExceptionDoesntCrashEvaluator
>>>>> and TestRuntimeNameSpecifyingValidName are transient failures. We 
>>>>> can log
>>>> Jira
>>>>> for following up similar as what we have agreed on other test 
>>>>> transient failures.
>>>>> 
>>>>> 
>>>>> +1 for followig up them as transient failures.
>>>>> 
>>>>> 
>>>>> Today I tested IMRU example on Yarn.
>>>>> With 500 nodes, test pass. I run multiple times, they all pass.
>>>>> With 1000 nodes, test fails. Received 1000 completed tasks but 
>>>>> only
>>>>> 998 completed evaluators. Drive doesn’t shut down until I kill it.
>>>>> 
>>>>> With 800 nodes, test fails. Received 800 completed tasks but only
>>>>> 799 completed evaluators. Drive doesn’t shut down until I kill it.
>>>>> 
>>>>> Options1: Find root cause and fix the issue before 0.16 release.
>>>>> From the logs, there is no error. Looks like finding the root 
>>>>> cause is not trivial job.  We had similar issue last year, it took 
>>>>> big effort for Mariia and
>>>> me
>>>>> to identify the issue.
>>>>> Options 2: Log JIRA and follow up later.
>>>>> 
>>>>> 
>>>>> +1 for option 2.
>>>>> 
>>>>> I think it would take a long time to investigate root cause, so 
>>>>> I'm
>>>> worried
>>>>> that 0.16 release will be delayed for a long time again.
>>>>> 
>>>>> How about releasing 0.16 with the known issues, and fixing them in
>> 0.17?
>>>> Or
>>>>> if we resolve the issues quickly, we could do a minor release 
>>>>> (e.g.,
>>>>> 0.16.1) ?
>>>>> 
>>>>> Taegeon
>>>>> 
>>>>> 
>>>>> Julia
>>>>> 
>>>>> 
>>>>> From: Kim Doyoung [mailto:disoxc21@gmail.com]
>>>>> Sent: Friday, August 4, 2017 2:50 AM
>>>>> To: dev@reef.apache.org
>>>>> Subject: RE: [VOTE] Release Apache REEF 0.16.0 (rc1)
>>>>> 
>>>>> Hi
>>>>> 
>>>>> I ran test on Java and .Net.
>>>>> My environment is Windows 10 x64 ver 1703 with java 1.8.0_141
>>>>> 
>>>>> Java built and passed tests well by command ‘mvn clean install’
>>>>> 
>>>>> But .Net side has failed to pass a test with Visual Studio 2015.
>>>>> It’s not a same test failure with Julia.
>>>>> 
>>>>> TestUnhandledTaskExceptionDoesntCrashEvaluator
>>>>> [cid:image001.png@01D30D51.E782D6E0]
>>>>> 
>>>>> `TestRuntimeNameSpecifyingValidName` Test was passed.
>>>>> 
>>>>> Thank you.
>>>>> 
>>>>> Doyoung
>>>>> 
>>>>> 보낸 사람: Taegeon Um<mailto:taegeonum@gmail.com> 보낸 날짜:
2017년 8월 4일 
>>>>> 금요일 오후 3:31 받는 사람: dev@reef.apache.org<mailto:dev@reef.apache.org>
>>>>> 제목: Re: [VOTE] Release Apache REEF 0.16.0 (rc1)
>>>>> 
>>>>> Thanks Julia for sharing the result!
>>>>> 
>>>>> Is there someone who experienced the same test failure in .Net?
>>>>> 
>>>>> Taegeon
>>>>> 
>>>>>> On Aug 4, 2017, at 3:19 PM, Julia Wang (QIUHE) <
>>>> Qiuhe.Wang@microsoft.com
>>>>> .
>>>>> INVALID<mailto:Qiuhe.Wang@microsoft.com.INVALID>> wrote:
>>>>>> 
>>>>>> -Test HelloREEF from .Net to Java on YARN is successful.
>>>>>> - mvn clean install pass on Windows Server 2002 R2 -.Net tests in

>>>>>> VS 2005 all passed with yarn test filtered, except one
>>>>> test failure
>>>>>>     TestRuntimeNameSpecifyingValidName
>>>>>> The test error is cannot read log file, same as the transient 
>>>>>> errors in
>>>>> other tests. However, I run 3 times with this test only, all fail 
>>>>> in my local box.
>>>>>> 
>>>>>> Can someone run .Net tests in your box to see if it can repro?
>>>>>> 
>>>>>> Julia
>>>>>> 
>>>>>> -----Original Message-----
>>>>>> From: Julia Wang (QIUHE) 
>>>>>> [mailto:Qiuhe.Wang@microsoft.com.INVALID]
>>>>>> Sent: Thursday, August 3, 2017 7:23 PM
>>>>>> To: dev@reef.apache.org<mailto:dev@reef.apache.org>
>>>>>> Subject: RE: [VOTE] Release Apache REEF 0.16.0 (rc1)
>>>>>> 
>>>>>> Right, we need not only vote counts but test coverage. I will 
>>>>>> test on
>>>>> YARN.
>>>>>> 
>>>>>> Julia
>>>>>> 
>>>>>> -----Original Message-----
>>>>>> From: Taegeon Um [mailto:taegeonum@gmail.com]
>>>>>> Sent: Thursday, August 3, 2017 6:50 PM
>>>>>> To: dev@reef.apache.org<mailto:dev@reef.apache.org>
>>>>>> Subject: Re: [VOTE] Release Apache REEF 0.16.0 (rc1)
>>>>>> 
>>>>>> Hi all,
>>>>>> 
>>>>>> Thanks for running tests on various environments!
>>>>>> 
>>>>>> We already have 6 +1 :), but it would be more great if someone 
>>>>>> runs the
>>>>> test on HDI and Yarn.
>>>>>> 
>>>>>> Thanks,
>>>>>> Taegeon
>>>>>> 
>>>>>>> On Aug 4, 2017, at 3:07 AM, Sergiy Matusevych <
>>>>> sergiy.matusevych@gmail.com<mailto:sergiy.matusevych@gmail.com>>
>> wrote:
>>>>>>> 
>>>>>>> Here's what I have:
>>>>>>> 
>>>>>>> Environment 1:
>>>>>>> * Windows 10 Pro 1703 build 15063.483
>>>>>>> * Java HotSpot(TM) 64-Bit Server VM (build 25.144-b01, mixed
>>>>>>> mode)
>>>>>>> * Visual Studio 2017 Enterprise version 15.2 (26430.16)
>>>>>>> * Microsoft .NET Framework version 4.7.02046
>>>>>>> 
>>>>>>> mvn clean install
>>>>>>> all tests pass
>>>>>>> 
>>>>>>> Visual Studio the following tests fail:
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> (all apparently related to YARN environment)
>>>>>>> 
>>>>>>> Environment 2:
>>>>>>> * Ubuntu Linux 17.04
>>>>>>> * OpenJDK 64-Bit Server VM (build 25.131-b11, mixed mode)
>>>>>>> 
>>>>>>> mvn clean install
>>>>>>> all tests pass
>>>>>>> 
>>>>>>> 
>>>>>>> My vote is +1
>>>>>>> 
>>>>>>> Great job everyone!
>>>>>>> 
>>>>>>> Cheers,
>>>>>>> Sergiy.
>>>>>>> 
>>>>>>> 
>>>>>>> On Tue, Aug 1, 2017 at 10:44 PM, Taegeon Um <taegeonum@gmail.com
>>>>> <mailto:
>>>>> taegeonum@gmail.com<mailto:taegeonum@gmail.com%20%
>>>>> 3cmailto:taegeonum@gmail.com>>> wrote:
>>>>>>> This is to call for a new vote for the source release of Apache

>>>>>>> REEF
>>>>> 0.16.0 (rc1).
>>>>>>> 
>>>>>>> The source tar ball, including signatures, digests, etc can be

>>>>>>> found
>>>> at:
>>>>>>> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2
>>>>>>> F
>>>>>>> dist
>>>> .
>>>>>>> apache.org%2Frepos%2Fdist%2Fdev%2Freef%2F0.16.0-rc1%2F&
>>>> data=02%7C01%7C
>>>>>>> Qiuhe.Wang%40microsoft.com%7C962355591ab34abfc1c908d4dadb
>>>> 20a8%7C72f988
>>>>>>> bf86f141af91ab2d7cd011db47%7C1%7C0%7C636374082064906216&
>>>> sdata=xjCvVqLO
>>>>>>> iymatXvxN0glp9Ty9xip1fcpMint0HYMrkA%3D&reserved=0
>>>>>>> <https://na01.safelinks.protection.outlook.com/?url=
>>>> https%3A%2F%2Fdist
>>>>>>> .apache.org%2Frepos%2Fdist%2Fdev%2Freef%2F0.16.0-rc1%2F&
>>>> data=02%7C01%7
>>>>>>> CQiuhe.Wang%40microsoft.com%7C962355591ab34abfc1c908d4dadb
>>>> 20a8%7C72f98
>>>>>>> 8bf86f141af91ab2d7cd011db47%7C1%7C0%7C636374082064906216&
>>>> sdata=xjCvVqL
>>>>>>> OiymatXvxN0glp9Ty9xip1fcpMint0HYMrkA%3D&reserved=0>
>>>>>>> <https://na01.safelinks.protection.outlook.com/?url=
>>>> https%3A%2F%2Fdist
>>>>>>> .apache.org%2Frepos%2Fdist%2Fdev%2Freef%2F0.16.0-rc1%2F&
>>>> data=02%7C01%7
>>>>>>> CQiuhe.Wang%40microsoft.com%7C962355591ab34abfc1c908d4dadb
>>>> 20a8%7C72f98
>>>>>>> 8bf86f141af91ab2d7cd011db47%7C1%7C0%7C636374082064906216&
>>>> sdata=xjCvVqL
>>>>>>> OiymatXvxN0glp9Ty9xip1fcpMint0HYMrkA%3D&reserved=0
>>>>>>> <https://na01.safelinks.protection.outlook.com/?url=
>>>> https%3A%2F%2Fdist
>>>>>>> .apache.org%2Frepos%2Fdist%2Fdev%2Freef%2F0.16.0-rc1%2F&
>>>> data=02%7C01%7
>>>>>>> CQiuhe.Wang%40microsoft.com%7C962355591ab34abfc1c908d4dadb
>>>> 20a8%7C72f98
>>>>>>> 8bf86f141af91ab2d7cd011db47%7C1%7C0%7C636374082064906216&
>>>> sdata=xjCvVqL
>>>>>>> OiymatXvxN0glp9Ty9xip1fcpMint0HYMrkA%3D&reserved=0>>
>>>>>>> 
>>>>>>> The Git tag is release-0.16.0-rc1 The Git commit ID is
>>>>>>> 85cc0a090ab48cf27acce2128c64b07b197d92e5
>>>>>>> 
>>>>>>> Checksums of apache-reef-0.16.0-rc1.tar.gz:
>>>>>>> 
>>>>>>> MD5: 155673fe44f95be9362b9075865c8cad
>>>>>>> SHA:
>>>>>>> d62c58df1f4ba962a51d81579d27321f75dad98c3c3def9bc8fb24ebf1e2
>>>> 7978029d7d
>>>>>>> dedf26bf4ee9434cb8e6d0e4f6e1a9a4d240d03daccd9ef66bdc403f1b
>>>>>>> 
>>>>>>> Release artifacts are signed with a key found in the KEYS file
>>>> available
>>>>> here:
>>>>>>> 
>>>>>>> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2
>>>>>>> F
>>>>>>> dist
>>>> .
>>>>>>> apache.org%2Frepos%2Fdist%2Frelease%2Freef%2FKEYS&data=
>>>> 02%7C01%7CQiuhe
>>>>>>> .Wang%40microsoft.com%7C962355591ab34abfc1c908d4dadb
>>>> 20a8%7C72f988bf86f
>>>>>>> 141af91ab2d7cd011db47%7C1%7C0%7C636374082064906216&sdata=
>>>> ozT2s3kOfzDgT
>>>>>>> SviedxkyoOz18bkUElStigHmh1Fzmo%3D&reserved=0
>>>>>>> <https://na01.safelinks.protection.outlook.com/?url=
>>>> https%3A%2F%2Fdist
>>>>>>> .apache.org%2Frepos%2Fdist%2Frelease%2Freef%2FKEYS&data=
>>>> 02%7C01%7CQiuh
>>>>>>> e.Wang%40microsoft.com%7C962355591ab34abfc1c908d4dadb
>>>> 20a8%7C72f988bf86
>>>>>>> f141af91ab2d7cd011db47%7C1%7C0%7C636374082064906216&
>>>> sdata=ozT2s3kOfzDg
>>>>>>> TSviedxkyoOz18bkUElStigHmh1Fzmo%3D&reserved=0>
>>>>>>> <https://na01.safelinks.protection.outlook.com/?url=
>>>> https%3A%2F%2Fdist
>>>>>>> .apache.org%2Frepos%2Fdist%2Frelease%2Freef%2FKEYS&data=
>>>> 02%7C01%7CQiuh
>>>>>>> e.Wang%40microsoft.com%7C962355591ab34abfc1c908d4dadb
>>>> 20a8%7C72f988bf86
>>>>>>> f141af91ab2d7cd011db47%7C1%7C0%7C636374082064906216&
>>>> sdata=ozT2s3kOfzDg
>>>>>>> TSviedxkyoOz18bkUElStigHmh1Fzmo%3D&reserved=0
>>>>>>> <https://na01.safelinks.protection.outlook.com/?url=
>>>> https%3A%2F%2Fdist
>>>>>>> .apache.org%2Frepos%2Fdist%2Frelease%2Freef%2FKEYS&data=
>>>> 02%7C01%7CQiuh
>>>>>>> e.Wang%40microsoft.com%7C962355591ab34abfc1c908d4dadb
>>>> 20a8%7C72f988bf86
>>>>>>> f141af91ab2d7cd011db47%7C1%7C0%7C636374082064906216&
>>>> sdata=ozT2s3kOfzDg
>>>>>>> TSviedxkyoOz18bkUElStigHmh1Fzmo%3D&reserved=0>>
>>>>>>> 
>>>>>>> 
>>>>>>> Issues resolved in this release:
>>>>>>> https://na01.safelinks.protection.outlook.com/?url=
>>>> https%3A%2F%2Fissue
>>>>>>> s.apache.org%2Fjira%2Fsecure%2FReleaseNote.jspa%
>>>> 3FprojectId%3D12315820
>>>>>>> %26version%3D12335833&data=02%7C01%7CQiuhe.Wang%40microsoft.com
>>>> %7C9623
>>>>>>> 55591ab34abfc1c908d4dadb20a8%7C72f988bf86f141af91ab2d7cd011
>>>> db47%7C1%7C
>>>>>>> 0%7C636374082064906216&sdata=ojQ%2FJYKmfIsCHdTPCkO%
>>>> 2BmWjYs3BLlxzcUKpV4
>>>>>>> CBUaSg%3D&reserved=0
>>>>>>> <https://na01.safelinks.protection.outlook.com/?url=
>>>> https%3A%2F%2Fissu
>>>>>>> es.apache.org%2Fjira%2Fsecure%2FReleaseNote.jspa%
>>>> 3FprojectId%3D1231582
>>>>>>> 0%26version%3D12335833&data=02%7C01%7CQiuhe.Wang%40microsoft.com
>>>> %7C962
>>>>>>> 355591ab34abfc1c908d4dadb20a8%7C72f988bf86f141af91ab2d7cd011
>>>> db47%7C1%7
>>>>>>> C0%7C636374082064906216&sdata=ojQ%2FJYKmfIsCHdTPCkO%
>>>> 2BmWjYs3BLlxzcUKpV
>>>>>>> 4CBUaSg%3D&reserved=0>
>>>>>>> <https://na01.safelinks.protection.outlook.com/?url=
>>>> https%3A%2F%2Fissu
>>>>>>> es.apache.org%2Fjira%2Fsecure%2FReleaseNote.jspa%
>>>> 3FprojectId%3D1231582
>>>>>>> 0%26version%3D12335833&data=02%7C01%7CQiuhe.Wang%40microsoft.com
>>>> %7C962
>>>>>>> 355591ab34abfc1c908d4dadb20a8%7C72f988bf86f141af91ab2d7cd011
>>>> db47%7C1%7
>>>>>>> C0%7C636374082064906216&sdata=ojQ%2FJYKmfIsCHdTPCkO%
>>>> 2BmWjYs3BLlxzcUKpV
>>>>>>> 4CBUaSg%3D&reserved=0
>>>>>>> <https://na01.safelinks.protection.outlook.com/?url=
>>>> https%3A%2F%2Fissu
>>>>>>> es.apache.org%2Fjira%2Fsecure%2FReleaseNote.jspa%
>>>> 3FprojectId%3D1231582
>>>>>>> 0%26version%3D12335833&data=02%7C01%7CQiuhe.Wang%40microsoft.com
>>>> %7C962
>>>>>>> 355591ab34abfc1c908d4dadb20a8%7C72f988bf86f141af91ab2d7cd011
>>>> db47%7C1%7
>>>>>>> C0%7C636374082064906216&sdata=ojQ%2FJYKmfIsCHdTPCkO%
>>>> 2BmWjYs3BLlxzcUKpV
>>>>>>> 4CBUaSg%3D&reserved=0>>
>>>>>>> 
>>>>>>> 
>>>>>>> The vote will be open for 72 hours. Please download the release

>>>>>>> candidate, check the hashes/signature, build it and test it,
and 
>>>>>>> then please vote:
>>>>>>> 
>>>>>>> [ ] +1 Release this package as Apache REEF 0.16.0 [ ] +0 no 
>>>>>>> opinion [ ] -1 Do not release this package because ...
>>>>>>> 
>>>>>>> Thanks!
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>>> 
>>> 
>>> --
>>> Byung-Gon Chun
>> 
> 
> 
> 
> --
> Byung-Gon Chun


Mime
View raw message