reef-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Julia Wang (QIUHE)" <Qiuhe.W...@microsoft.com.INVALID>
Subject Re: [VOTE] Release Apache REEF 0.16.0 (rc1)
Date Tue, 08 Aug 2017 18:28:21 GMT
I am doing some more testing to try to narrow down the cause of the failure. I will update
the thread soon after I have more findings. 

Julia

Sent from my iPhone

> On Aug 7, 2017, at 3:44 PM, Taegeon Um <taegeonum@gmail.com> wrote:
> 
> Hi Julia,
> 
> Are you still -1 on releasing 0.16?
> 
> Thanks,
> Taegeon
> 
> 
> 2017. 8. 5. 오후 3:23에 "Julia Wang (QIUHE)" <Qiuhe.Wang@microsoft.com.invalid>님이
> 작성:
> 
> HI Gon,
> 
> Agree, it is hard to resolve. But I am pretty sure it was introduced in the
> first few month of this year. It was working last year ☹
> 
> Julia
> 
> -----Original Message-----
> From: Byung-Gon Chun [mailto:bgchun@gmail.com]
> Sent: Friday, August 4, 2017 10:53 PM
> To: dev@reef.apache.org
> Subject: Re: [VOTE] Release Apache REEF 0.16.0 (rc1)
> 
> Julia, thanks for running the tests!
> Scalability bugs are hard to debug. It's not likely that we will resolve
> them quickly.
> 
> I definitely vote for option 2. Once we fix the bugs, perhaps we can do a
> minor version update release.
> 
> 
>> On Fri, Aug 4, 2017 at 11:34 PM, Gyewon Lee <strayyyyyy@gmail.com> wrote:
>> 
>> I agree with Taegeon. +1 for option 2.
>> 
>> 2017-08-05 11:41 GMT+09:00 Taegeon Um <taegeonum@gmail.com>:
>> 
>>> Thanks Julia for sharimg the issue!
>>> 
>>> 2017. 8. 5. 오전 10:56에 "Julia Wang (QIUHE)" <Qiuhe.Wang@microsoft.com.
>>> invalid>님이
>>> 작성:
>>> 
>>> TestUnhandledTaskExceptionDoesntCrashEvaluator passed in my env.
>>> 
>>> I would think both TestUnhandledTaskExceptionDoesntCrashEvaluator
>>> and TestRuntimeNameSpecifyingValidName are transient failures. We
>>> can log
>> Jira
>>> for following up similar as what we have agreed on other test
>>> transient failures.
>>> 
>>> 
>>> +1 for followig up them as transient failures.
>>> 
>>> 
>>> Today I tested IMRU example on Yarn.
>>> With 500 nodes, test pass. I run multiple times, they all pass.
>>> With 1000 nodes, test fails. Received 1000 completed tasks but only
>>> 998 completed evaluators. Drive doesn’t shut down until I kill it.
>>> 
>>> With 800 nodes, test fails. Received 800 completed tasks but only
>>> 799 completed evaluators. Drive doesn’t shut down until I kill it.
>>> 
>>> Options1: Find root cause and fix the issue before 0.16 release.
>>> From the logs, there is no error. Looks like finding the root cause
>>> is not trivial job.  We had similar issue last year, it took big
>>> effort for Mariia and
>> me
>>> to identify the issue.
>>> Options 2: Log JIRA and follow up later.
>>> 
>>> 
>>> +1 for option 2.
>>> 
>>> I think it would take a long time to investigate root cause, so I'm
>> worried
>>> that 0.16 release will be delayed for a long time again.
>>> 
>>> How about releasing 0.16 with the known issues, and fixing them in 0.17?
>> Or
>>> if we resolve the issues quickly, we could do a minor release (e.g.,
>>> 0.16.1) ?
>>> 
>>> Taegeon
>>> 
>>> 
>>> Julia
>>> 
>>> 
>>> From: Kim Doyoung [mailto:disoxc21@gmail.com]
>>> Sent: Friday, August 4, 2017 2:50 AM
>>> To: dev@reef.apache.org
>>> Subject: RE: [VOTE] Release Apache REEF 0.16.0 (rc1)
>>> 
>>> Hi
>>> 
>>> I ran test on Java and .Net.
>>> My environment is Windows 10 x64 ver 1703 with java 1.8.0_141
>>> 
>>> Java built and passed tests well by command ‘mvn clean install’
>>> 
>>> But .Net side has failed to pass a test with Visual Studio 2015.
>>> It’s not a same test failure with Julia.
>>> 
>>> TestUnhandledTaskExceptionDoesntCrashEvaluator
>>> [cid:image001.png@01D30D51.E782D6E0]
>>> 
>>> `TestRuntimeNameSpecifyingValidName` Test was passed.
>>> 
>>> Thank you.
>>> 
>>> Doyoung
>>> 
>>> 보낸 사람: Taegeon Um<mailto:taegeonum@gmail.com> 보낸 날짜: 2017년
8월 4일 금요일
>>> 오후 3:31 받는 사람: dev@reef.apache.org<mailto:dev@reef.apache.org>
>>> 제목: Re: [VOTE] Release Apache REEF 0.16.0 (rc1)
>>> 
>>> Thanks Julia for sharing the result!
>>> 
>>> Is there someone who experienced the same test failure in .Net?
>>> 
>>> Taegeon
>>> 
>>>> On Aug 4, 2017, at 3:19 PM, Julia Wang (QIUHE) <
>> Qiuhe.Wang@microsoft.com
>>> .
>>> INVALID<mailto:Qiuhe.Wang@microsoft.com.INVALID>> wrote:
>>>> 
>>>> -Test HelloREEF from .Net to Java on YARN is successful.
>>>> - mvn clean install pass on Windows Server 2002 R2 -.Net tests in
>>>> VS 2005 all passed with yarn test filtered, except one
>>> test failure
>>>>      TestRuntimeNameSpecifyingValidName
>>>> The test error is cannot read log file, same as the transient
>>>> errors in
>>> other tests. However, I run 3 times with this test only, all fail in
>>> my local box.
>>>> 
>>>> Can someone run .Net tests in your box to see if it can repro?
>>>> 
>>>> Julia
>>>> 
>>>> -----Original Message-----
>>>> From: Julia Wang (QIUHE) [mailto:Qiuhe.Wang@microsoft.com.INVALID]
>>>> Sent: Thursday, August 3, 2017 7:23 PM
>>>> To: dev@reef.apache.org<mailto:dev@reef.apache.org>
>>>> Subject: RE: [VOTE] Release Apache REEF 0.16.0 (rc1)
>>>> 
>>>> Right, we need not only vote counts but test coverage. I will test
>>>> on
>>> YARN.
>>>> 
>>>> Julia
>>>> 
>>>> -----Original Message-----
>>>> From: Taegeon Um [mailto:taegeonum@gmail.com]
>>>> Sent: Thursday, August 3, 2017 6:50 PM
>>>> To: dev@reef.apache.org<mailto:dev@reef.apache.org>
>>>> Subject: Re: [VOTE] Release Apache REEF 0.16.0 (rc1)
>>>> 
>>>> Hi all,
>>>> 
>>>> Thanks for running tests on various environments!
>>>> 
>>>> We already have 6 +1 :), but it would be more great if someone
>>>> runs the
>>> test on HDI and Yarn.
>>>> 
>>>> Thanks,
>>>> Taegeon
>>>> 
>>>>> On Aug 4, 2017, at 3:07 AM, Sergiy Matusevych <
>>> sergiy.matusevych@gmail.com<mailto:sergiy.matusevych@gmail.com>> wrote:
>>>>> 
>>>>> Here's what I have:
>>>>> 
>>>>> Environment 1:
>>>>> * Windows 10 Pro 1703 build 15063.483
>>>>> * Java HotSpot(TM) 64-Bit Server VM (build 25.144-b01, mixed
>>>>> mode)
>>>>> * Visual Studio 2017 Enterprise version 15.2 (26430.16)
>>>>> * Microsoft .NET Framework version 4.7.02046
>>>>> 
>>>>> mvn clean install
>>>>> all tests pass
>>>>> 
>>>>> Visual Studio the following tests fail:
>>>>> 
>>>>> 
>>>>> 
>>>>> (all apparently related to YARN environment)
>>>>> 
>>>>> Environment 2:
>>>>> * Ubuntu Linux 17.04
>>>>> * OpenJDK 64-Bit Server VM (build 25.131-b11, mixed mode)
>>>>> 
>>>>> mvn clean install
>>>>> all tests pass
>>>>> 
>>>>> 
>>>>> My vote is +1
>>>>> 
>>>>> Great job everyone!
>>>>> 
>>>>> Cheers,
>>>>> Sergiy.
>>>>> 
>>>>> 
>>>>> On Tue, Aug 1, 2017 at 10:44 PM, Taegeon Um <taegeonum@gmail.com
>>> <mailto:
>>> taegeonum@gmail.com<mailto:taegeonum@gmail.com%20%
>>> 3cmailto:taegeonum@gmail.com>>> wrote:
>>>>> This is to call for a new vote for the source release of Apache
>>>>> REEF
>>> 0.16.0 (rc1).
>>>>> 
>>>>> The source tar ball, including signatures, digests, etc can be
>>>>> found
>> at:
>>>>> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2F
>>>>> dist
>> .
>>>>> apache.org%2Frepos%2Fdist%2Fdev%2Freef%2F0.16.0-rc1%2F&
>> data=02%7C01%7C
>>>>> Qiuhe.Wang%40microsoft.com%7C962355591ab34abfc1c908d4dadb
>> 20a8%7C72f988
>>>>> bf86f141af91ab2d7cd011db47%7C1%7C0%7C636374082064906216&
>> sdata=xjCvVqLO
>>>>> iymatXvxN0glp9Ty9xip1fcpMint0HYMrkA%3D&reserved=0
>>>>> <https://na01.safelinks.protection.outlook.com/?url=
>> https%3A%2F%2Fdist
>>>>> .apache.org%2Frepos%2Fdist%2Fdev%2Freef%2F0.16.0-rc1%2F&
>> data=02%7C01%7
>>>>> CQiuhe.Wang%40microsoft.com%7C962355591ab34abfc1c908d4dadb
>> 20a8%7C72f98
>>>>> 8bf86f141af91ab2d7cd011db47%7C1%7C0%7C636374082064906216&
>> sdata=xjCvVqL
>>>>> OiymatXvxN0glp9Ty9xip1fcpMint0HYMrkA%3D&reserved=0>
>>>>> <https://na01.safelinks.protection.outlook.com/?url=
>> https%3A%2F%2Fdist
>>>>> .apache.org%2Frepos%2Fdist%2Fdev%2Freef%2F0.16.0-rc1%2F&
>> data=02%7C01%7
>>>>> CQiuhe.Wang%40microsoft.com%7C962355591ab34abfc1c908d4dadb
>> 20a8%7C72f98
>>>>> 8bf86f141af91ab2d7cd011db47%7C1%7C0%7C636374082064906216&
>> sdata=xjCvVqL
>>>>> OiymatXvxN0glp9Ty9xip1fcpMint0HYMrkA%3D&reserved=0
>>>>> <https://na01.safelinks.protection.outlook.com/?url=
>> https%3A%2F%2Fdist
>>>>> .apache.org%2Frepos%2Fdist%2Fdev%2Freef%2F0.16.0-rc1%2F&
>> data=02%7C01%7
>>>>> CQiuhe.Wang%40microsoft.com%7C962355591ab34abfc1c908d4dadb
>> 20a8%7C72f98
>>>>> 8bf86f141af91ab2d7cd011db47%7C1%7C0%7C636374082064906216&
>> sdata=xjCvVqL
>>>>> OiymatXvxN0glp9Ty9xip1fcpMint0HYMrkA%3D&reserved=0>>
>>>>> 
>>>>> The Git tag is release-0.16.0-rc1 The Git commit ID is
>>>>> 85cc0a090ab48cf27acce2128c64b07b197d92e5
>>>>> 
>>>>> Checksums of apache-reef-0.16.0-rc1.tar.gz:
>>>>> 
>>>>> MD5: 155673fe44f95be9362b9075865c8cad
>>>>> SHA:
>>>>> d62c58df1f4ba962a51d81579d27321f75dad98c3c3def9bc8fb24ebf1e2
>> 7978029d7d
>>>>> dedf26bf4ee9434cb8e6d0e4f6e1a9a4d240d03daccd9ef66bdc403f1b
>>>>> 
>>>>> Release artifacts are signed with a key found in the KEYS file
>> available
>>> here:
>>>>> 
>>>>> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2F
>>>>> dist
>> .
>>>>> apache.org%2Frepos%2Fdist%2Frelease%2Freef%2FKEYS&data=
>> 02%7C01%7CQiuhe
>>>>> .Wang%40microsoft.com%7C962355591ab34abfc1c908d4dadb
>> 20a8%7C72f988bf86f
>>>>> 141af91ab2d7cd011db47%7C1%7C0%7C636374082064906216&sdata=
>> ozT2s3kOfzDgT
>>>>> SviedxkyoOz18bkUElStigHmh1Fzmo%3D&reserved=0
>>>>> <https://na01.safelinks.protection.outlook.com/?url=
>> https%3A%2F%2Fdist
>>>>> .apache.org%2Frepos%2Fdist%2Frelease%2Freef%2FKEYS&data=
>> 02%7C01%7CQiuh
>>>>> e.Wang%40microsoft.com%7C962355591ab34abfc1c908d4dadb
>> 20a8%7C72f988bf86
>>>>> f141af91ab2d7cd011db47%7C1%7C0%7C636374082064906216&
>> sdata=ozT2s3kOfzDg
>>>>> TSviedxkyoOz18bkUElStigHmh1Fzmo%3D&reserved=0>
>>>>> <https://na01.safelinks.protection.outlook.com/?url=
>> https%3A%2F%2Fdist
>>>>> .apache.org%2Frepos%2Fdist%2Frelease%2Freef%2FKEYS&data=
>> 02%7C01%7CQiuh
>>>>> e.Wang%40microsoft.com%7C962355591ab34abfc1c908d4dadb
>> 20a8%7C72f988bf86
>>>>> f141af91ab2d7cd011db47%7C1%7C0%7C636374082064906216&
>> sdata=ozT2s3kOfzDg
>>>>> TSviedxkyoOz18bkUElStigHmh1Fzmo%3D&reserved=0
>>>>> <https://na01.safelinks.protection.outlook.com/?url=
>> https%3A%2F%2Fdist
>>>>> .apache.org%2Frepos%2Fdist%2Frelease%2Freef%2FKEYS&data=
>> 02%7C01%7CQiuh
>>>>> e.Wang%40microsoft.com%7C962355591ab34abfc1c908d4dadb
>> 20a8%7C72f988bf86
>>>>> f141af91ab2d7cd011db47%7C1%7C0%7C636374082064906216&
>> sdata=ozT2s3kOfzDg
>>>>> TSviedxkyoOz18bkUElStigHmh1Fzmo%3D&reserved=0>>
>>>>> 
>>>>> 
>>>>> Issues resolved in this release:
>>>>> https://na01.safelinks.protection.outlook.com/?url=
>> https%3A%2F%2Fissue
>>>>> s.apache.org%2Fjira%2Fsecure%2FReleaseNote.jspa%
>> 3FprojectId%3D12315820
>>>>> %26version%3D12335833&data=02%7C01%7CQiuhe.Wang%40microsoft.com
>> %7C9623
>>>>> 55591ab34abfc1c908d4dadb20a8%7C72f988bf86f141af91ab2d7cd011
>> db47%7C1%7C
>>>>> 0%7C636374082064906216&sdata=ojQ%2FJYKmfIsCHdTPCkO%
>> 2BmWjYs3BLlxzcUKpV4
>>>>> CBUaSg%3D&reserved=0
>>>>> <https://na01.safelinks.protection.outlook.com/?url=
>> https%3A%2F%2Fissu
>>>>> es.apache.org%2Fjira%2Fsecure%2FReleaseNote.jspa%
>> 3FprojectId%3D1231582
>>>>> 0%26version%3D12335833&data=02%7C01%7CQiuhe.Wang%40microsoft.com
>> %7C962
>>>>> 355591ab34abfc1c908d4dadb20a8%7C72f988bf86f141af91ab2d7cd011
>> db47%7C1%7
>>>>> C0%7C636374082064906216&sdata=ojQ%2FJYKmfIsCHdTPCkO%
>> 2BmWjYs3BLlxzcUKpV
>>>>> 4CBUaSg%3D&reserved=0>
>>>>> <https://na01.safelinks.protection.outlook.com/?url=
>> https%3A%2F%2Fissu
>>>>> es.apache.org%2Fjira%2Fsecure%2FReleaseNote.jspa%
>> 3FprojectId%3D1231582
>>>>> 0%26version%3D12335833&data=02%7C01%7CQiuhe.Wang%40microsoft.com
>> %7C962
>>>>> 355591ab34abfc1c908d4dadb20a8%7C72f988bf86f141af91ab2d7cd011
>> db47%7C1%7
>>>>> C0%7C636374082064906216&sdata=ojQ%2FJYKmfIsCHdTPCkO%
>> 2BmWjYs3BLlxzcUKpV
>>>>> 4CBUaSg%3D&reserved=0
>>>>> <https://na01.safelinks.protection.outlook.com/?url=
>> https%3A%2F%2Fissu
>>>>> es.apache.org%2Fjira%2Fsecure%2FReleaseNote.jspa%
>> 3FprojectId%3D1231582
>>>>> 0%26version%3D12335833&data=02%7C01%7CQiuhe.Wang%40microsoft.com
>> %7C962
>>>>> 355591ab34abfc1c908d4dadb20a8%7C72f988bf86f141af91ab2d7cd011
>> db47%7C1%7
>>>>> C0%7C636374082064906216&sdata=ojQ%2FJYKmfIsCHdTPCkO%
>> 2BmWjYs3BLlxzcUKpV
>>>>> 4CBUaSg%3D&reserved=0>>
>>>>> 
>>>>> 
>>>>> The vote will be open for 72 hours. Please download the release
>>>>> candidate, check the hashes/signature, build it and test it, and
>>>>> then please vote:
>>>>> 
>>>>> [ ] +1 Release this package as Apache REEF 0.16.0 [ ] +0 no
>>>>> opinion [ ] -1 Do not release this package because ...
>>>>> 
>>>>> Thanks!
>>>>> 
>>>> 
>>> 
>> 
> 
> 
> 
> --
> Byung-Gon Chun
Mime
View raw message