reef-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kim Doyoung <disox...@gmail.com>
Subject RE: [VOTE] Release Apache REEF 0.16.0 (rc1)
Date Sun, 06 Aug 2017 13:52:02 GMT
I have made an issue about TestUnhandledTaskExceptionDoesntCrashEvaluator.

If more description is needed, let me know.

Thank you.
Doyoung

보낸 사람: Taegeon Um
보낸 날짜: 2017년 8월 6일 일요일 오후 10:17
받는 사람: dev@reef.apache.org
제목: Re: [VOTE] Release Apache REEF 0.16.0 (rc1)

I’ve created REEF-1850 to follow up the scalability issue and REEF1851 for the transient
failure of TestRuntimeNameSpecifyingValidName. 

Doyoung, I cannot see the image you attached, so could you please create an issue for the
transient failure of TestUnhandledTaskExceptionDoesntCrashEvaluator? 

Thanks,
Taegeon


> On Aug 6, 2017, at 3:05 PM, Byung-Gon Chun <bgchun@gmail.com> wrote:
> 
> Taegeon, could you register a jira issue about scalability bugs?
> 
> Julia, are you changing your vote? :)
> 
> Thanks!
> -Gon
> 
> 
> On Sat, Aug 5, 2017 at 3:31 PM, Byung-Gon Chun <bgchun@gmail.com> wrote:
> 
>> Thanks for the info!
>> 
>> Unless we have a testing infrastructure for large-scale experiments, it is
>> hard to discover such bugs early.
>> 
>> This can be another hackathon topic. :)
>> 
>> -Gon
>> 
>> Sent from my iPhone
>> 
>> 2017. 8. 4. 오후 11:23 Julia Wang (QIUHE) <Qiuhe.Wang@microsoft.com.INVALID>
>> 작성:
>> 
>>> HI Gon,
>>> 
>>> Agree, it is hard to resolve. But I am pretty sure it was introduced in
>> the first few month of this year. It was working last year ☹
>>> 
>>> Julia
>>> 
>>> -----Original Message-----
>>> From: Byung-Gon Chun [mailto:bgchun@gmail.com]
>>> Sent: Friday, August 4, 2017 10:53 PM
>>> To: dev@reef.apache.org
>>> Subject: Re: [VOTE] Release Apache REEF 0.16.0 (rc1)
>>> 
>>> Julia, thanks for running the tests!
>>> Scalability bugs are hard to debug. It's not likely that we will resolve
>> them quickly.
>>> 
>>> I definitely vote for option 2. Once we fix the bugs, perhaps we can do
>> a minor version update release.
>>> 
>>> 
>>>> On Fri, Aug 4, 2017 at 11:34 PM, Gyewon Lee <strayyyyyy@gmail.com>
>> wrote:
>>>> 
>>>> I agree with Taegeon. +1 for option 2.
>>>> 
>>>> 2017-08-05 11:41 GMT+09:00 Taegeon Um <taegeonum@gmail.com>:
>>>> 
>>>>> Thanks Julia for sharimg the issue!
>>>>> 
>>>>> 2017. 8. 5. 오전 10:56에 "Julia Wang (QIUHE)" <Qiuhe.Wang@microsoft.com.
>>>>> invalid>님이
>>>>> 작성:
>>>>> 
>>>>> TestUnhandledTaskExceptionDoesntCrashEvaluator passed in my env.
>>>>> 
>>>>> I would think both TestUnhandledTaskExceptionDoesntCrashEvaluator
>>>>> and TestRuntimeNameSpecifyingValidName are transient failures. We
>>>>> can log
>>>> Jira
>>>>> for following up similar as what we have agreed on other test
>>>>> transient failures.
>>>>> 
>>>>> 
>>>>> +1 for followig up them as transient failures.
>>>>> 
>>>>> 
>>>>> Today I tested IMRU example on Yarn.
>>>>> With 500 nodes, test pass. I run multiple times, they all pass.
>>>>> With 1000 nodes, test fails. Received 1000 completed tasks but only
>>>>> 998 completed evaluators. Drive doesn’t shut down until I kill it.
>>>>> 
>>>>> With 800 nodes, test fails. Received 800 completed tasks but only
>>>>> 799 completed evaluators. Drive doesn’t shut down until I kill it.
>>>>> 
>>>>> Options1: Find root cause and fix the issue before 0.16 release.
>>>>> From the logs, there is no error. Looks like finding the root cause
>>>>> is not trivial job.  We had similar issue last year, it took big
>>>>> effort for Mariia and
>>>> me
>>>>> to identify the issue.
>>>>> Options 2: Log JIRA and follow up later.
>>>>> 
>>>>> 
>>>>> +1 for option 2.
>>>>> 
>>>>> I think it would take a long time to investigate root cause, so I'm
>>>> worried
>>>>> that 0.16 release will be delayed for a long time again.
>>>>> 
>>>>> How about releasing 0.16 with the known issues, and fixing them in
>> 0.17?
>>>> Or
>>>>> if we resolve the issues quickly, we could do a minor release (e.g.,
>>>>> 0.16.1) ?
>>>>> 
>>>>> Taegeon
>>>>> 
>>>>> 
>>>>> Julia
>>>>> 
>>>>> 
>>>>> From: Kim Doyoung [mailto:disoxc21@gmail.com]
>>>>> Sent: Friday, August 4, 2017 2:50 AM
>>>>> To: dev@reef.apache.org
>>>>> Subject: RE: [VOTE] Release Apache REEF 0.16.0 (rc1)
>>>>> 
>>>>> Hi
>>>>> 
>>>>> I ran test on Java and .Net.
>>>>> My environment is Windows 10 x64 ver 1703 with java 1.8.0_141
>>>>> 
>>>>> Java built and passed tests well by command ‘mvn clean install’
>>>>> 
>>>>> But .Net side has failed to pass a test with Visual Studio 2015.
>>>>> It’s not a same test failure with Julia.
>>>>> 
>>>>> TestUnhandledTaskExceptionDoesntCrashEvaluator
>>>>> [cid:image001.png@01D30D51.E782D6E0]
>>>>> 
>>>>> `TestRuntimeNameSpecifyingValidName` Test was passed.
>>>>> 
>>>>> Thank you.
>>>>> 
>>>>> Doyoung
>>>>> 
>>>>> 보낸 사람: Taegeon Um<mailto:taegeonum@gmail.com> 보낸 날짜:
2017년 8월 4일 금요일
>>>>> 오후 3:31 받는 사람: dev@reef.apache.org<mailto:dev@reef.apache.org>
>>>>> 제목: Re: [VOTE] Release Apache REEF 0.16.0 (rc1)
>>>>> 
>>>>> Thanks Julia for sharing the result!
>>>>> 
>>>>> Is there someone who experienced the same test failure in .Net?
>>>>> 
>>>>> Taegeon
>>>>> 
>>>>>> On Aug 4, 2017, at 3:19 PM, Julia Wang (QIUHE) <
>>>> Qiuhe.Wang@microsoft.com
>>>>> .
>>>>> INVALID<mailto:Qiuhe.Wang@microsoft.com.INVALID>> wrote:
>>>>>> 
>>>>>> -Test HelloREEF from .Net to Java on YARN is successful.
>>>>>> - mvn clean install pass on Windows Server 2002 R2 -.Net tests in
>>>>>> VS 2005 all passed with yarn test filtered, except one
>>>>> test failure
>>>>>>     TestRuntimeNameSpecifyingValidName
>>>>>> The test error is cannot read log file, same as the transient
>>>>>> errors in
>>>>> other tests. However, I run 3 times with this test only, all fail in
>>>>> my local box.
>>>>>> 
>>>>>> Can someone run .Net tests in your box to see if it can repro?
>>>>>> 
>>>>>> Julia
>>>>>> 
>>>>>> -----Original Message-----
>>>>>> From: Julia Wang (QIUHE) [mailto:Qiuhe.Wang@microsoft.com.INVALID]
>>>>>> Sent: Thursday, August 3, 2017 7:23 PM
>>>>>> To: dev@reef.apache.org<mailto:dev@reef.apache.org>
>>>>>> Subject: RE: [VOTE] Release Apache REEF 0.16.0 (rc1)
>>>>>> 
>>>>>> Right, we need not only vote counts but test coverage. I will test
>>>>>> on
>>>>> YARN.
>>>>>> 
>>>>>> Julia
>>>>>> 
>>>>>> -----Original Message-----
>>>>>> From: Taegeon Um [mailto:taegeonum@gmail.com]
>>>>>> Sent: Thursday, August 3, 2017 6:50 PM
>>>>>> To: dev@reef.apache.org<mailto:dev@reef.apache.org>
>>>>>> Subject: Re: [VOTE] Release Apache REEF 0.16.0 (rc1)
>>>>>> 
>>>>>> Hi all,
>>>>>> 
>>>>>> Thanks for running tests on various environments!
>>>>>> 
>>>>>> We already have 6 +1 :), but it would be more great if someone
>>>>>> runs the
>>>>> test on HDI and Yarn.
>>>>>> 
>>>>>> Thanks,
>>>>>> Taegeon
>>>>>> 
>>>>>>> On Aug 4, 2017, at 3:07 AM, Sergiy Matusevych <
>>>>> sergiy.matusevych@gmail.com<mailto:sergiy.matusevych@gmail.com>>
>> wrote:
>>>>>>> 
>>>>>>> Here's what I have:
>>>>>>> 
>>>>>>> Environment 1:
>>>>>>> * Windows 10 Pro 1703 build 15063.483
>>>>>>> * Java HotSpot(TM) 64-Bit Server VM (build 25.144-b01, mixed
>>>>>>> mode)
>>>>>>> * Visual Studio 2017 Enterprise version 15.2 (26430.16)
>>>>>>> * Microsoft .NET Framework version 4.7.02046
>>>>>>> 
>>>>>>> mvn clean install
>>>>>>> all tests pass
>>>>>>> 
>>>>>>> Visual Studio the following tests fail:
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> (all apparently related to YARN environment)
>>>>>>> 
>>>>>>> Environment 2:
>>>>>>> * Ubuntu Linux 17.04
>>>>>>> * OpenJDK 64-Bit Server VM (build 25.131-b11, mixed mode)
>>>>>>> 
>>>>>>> mvn clean install
>>>>>>> all tests pass
>>>>>>> 
>>>>>>> 
>>>>>>> My vote is +1
>>>>>>> 
>>>>>>> Great job everyone!
>>>>>>> 
>>>>>>> Cheers,
>>>>>>> Sergiy.
>>>>>>> 
>>>>>>> 
>>>>>>> On Tue, Aug 1, 2017 at 10:44 PM, Taegeon Um <taegeonum@gmail.com
>>>>> <mailto:
>>>>> taegeonum@gmail.com<mailto:taegeonum@gmail.com%20%
>>>>> 3cmailto:taegeonum@gmail.com>>> wrote:
>>>>>>> This is to call for a new vote for the source release of Apache
>>>>>>> REEF
>>>>> 0.16.0 (rc1).
>>>>>>> 
>>>>>>> The source tar ball, including signatures, digests, etc can be
>>>>>>> found
>>>> at:
>>>>>>> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2F
>>>>>>> dist
>>>> .
>>>>>>> apache.org%2Frepos%2Fdist%2Fdev%2Freef%2F0.16.0-rc1%2F&
>>>> data=02%7C01%7C
>>>>>>> Qiuhe.Wang%40microsoft.com%7C962355591ab34abfc1c908d4dadb
>>>> 20a8%7C72f988
>>>>>>> bf86f141af91ab2d7cd011db47%7C1%7C0%7C636374082064906216&
>>>> sdata=xjCvVqLO
>>>>>>> iymatXvxN0glp9Ty9xip1fcpMint0HYMrkA%3D&reserved=0
>>>>>>> <https://na01.safelinks.protection.outlook.com/?url=
>>>> https%3A%2F%2Fdist
>>>>>>> .apache.org%2Frepos%2Fdist%2Fdev%2Freef%2F0.16.0-rc1%2F&
>>>> data=02%7C01%7
>>>>>>> CQiuhe.Wang%40microsoft.com%7C962355591ab34abfc1c908d4dadb
>>>> 20a8%7C72f98
>>>>>>> 8bf86f141af91ab2d7cd011db47%7C1%7C0%7C636374082064906216&
>>>> sdata=xjCvVqL
>>>>>>> OiymatXvxN0glp9Ty9xip1fcpMint0HYMrkA%3D&reserved=0>
>>>>>>> <https://na01.safelinks.protection.outlook.com/?url=
>>>> https%3A%2F%2Fdist
>>>>>>> .apache.org%2Frepos%2Fdist%2Fdev%2Freef%2F0.16.0-rc1%2F&
>>>> data=02%7C01%7
>>>>>>> CQiuhe.Wang%40microsoft.com%7C962355591ab34abfc1c908d4dadb
>>>> 20a8%7C72f98
>>>>>>> 8bf86f141af91ab2d7cd011db47%7C1%7C0%7C636374082064906216&
>>>> sdata=xjCvVqL
>>>>>>> OiymatXvxN0glp9Ty9xip1fcpMint0HYMrkA%3D&reserved=0
>>>>>>> <https://na01.safelinks.protection.outlook.com/?url=
>>>> https%3A%2F%2Fdist
>>>>>>> .apache.org%2Frepos%2Fdist%2Fdev%2Freef%2F0.16.0-rc1%2F&
>>>> data=02%7C01%7
>>>>>>> CQiuhe.Wang%40microsoft.com%7C962355591ab34abfc1c908d4dadb
>>>> 20a8%7C72f98
>>>>>>> 8bf86f141af91ab2d7cd011db47%7C1%7C0%7C636374082064906216&
>>>> sdata=xjCvVqL
>>>>>>> OiymatXvxN0glp9Ty9xip1fcpMint0HYMrkA%3D&reserved=0>>
>>>>>>> 
>>>>>>> The Git tag is release-0.16.0-rc1 The Git commit ID is
>>>>>>> 85cc0a090ab48cf27acce2128c64b07b197d92e5
>>>>>>> 
>>>>>>> Checksums of apache-reef-0.16.0-rc1.tar.gz:
>>>>>>> 
>>>>>>> MD5: 155673fe44f95be9362b9075865c8cad
>>>>>>> SHA:
>>>>>>> d62c58df1f4ba962a51d81579d27321f75dad98c3c3def9bc8fb24ebf1e2
>>>> 7978029d7d
>>>>>>> dedf26bf4ee9434cb8e6d0e4f6e1a9a4d240d03daccd9ef66bdc403f1b
>>>>>>> 
>>>>>>> Release artifacts are signed with a key found in the KEYS file
>>>> available
>>>>> here:
>>>>>>> 
>>>>>>> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2F
>>>>>>> dist
>>>> .
>>>>>>> apache.org%2Frepos%2Fdist%2Frelease%2Freef%2FKEYS&data=
>>>> 02%7C01%7CQiuhe
>>>>>>> .Wang%40microsoft.com%7C962355591ab34abfc1c908d4dadb
>>>> 20a8%7C72f988bf86f
>>>>>>> 141af91ab2d7cd011db47%7C1%7C0%7C636374082064906216&sdata=
>>>> ozT2s3kOfzDgT
>>>>>>> SviedxkyoOz18bkUElStigHmh1Fzmo%3D&reserved=0
>>>>>>> <https://na01.safelinks.protection.outlook.com/?url=
>>>> https%3A%2F%2Fdist
>>>>>>> .apache.org%2Frepos%2Fdist%2Frelease%2Freef%2FKEYS&data=
>>>> 02%7C01%7CQiuh
>>>>>>> e.Wang%40microsoft.com%7C962355591ab34abfc1c908d4dadb
>>>> 20a8%7C72f988bf86
>>>>>>> f141af91ab2d7cd011db47%7C1%7C0%7C636374082064906216&
>>>> sdata=ozT2s3kOfzDg
>>>>>>> TSviedxkyoOz18bkUElStigHmh1Fzmo%3D&reserved=0>
>>>>>>> <https://na01.safelinks.protection.outlook.com/?url=
>>>> https%3A%2F%2Fdist
>>>>>>> .apache.org%2Frepos%2Fdist%2Frelease%2Freef%2FKEYS&data=
>>>> 02%7C01%7CQiuh
>>>>>>> e.Wang%40microsoft.com%7C962355591ab34abfc1c908d4dadb
>>>> 20a8%7C72f988bf86
>>>>>>> f141af91ab2d7cd011db47%7C1%7C0%7C636374082064906216&
>>>> sdata=ozT2s3kOfzDg
>>>>>>> TSviedxkyoOz18bkUElStigHmh1Fzmo%3D&reserved=0
>>>>>>> <https://na01.safelinks.protection.outlook.com/?url=
>>>> https%3A%2F%2Fdist
>>>>>>> .apache.org%2Frepos%2Fdist%2Frelease%2Freef%2FKEYS&data=
>>>> 02%7C01%7CQiuh
>>>>>>> e.Wang%40microsoft.com%7C962355591ab34abfc1c908d4dadb
>>>> 20a8%7C72f988bf86
>>>>>>> f141af91ab2d7cd011db47%7C1%7C0%7C636374082064906216&
>>>> sdata=ozT2s3kOfzDg
>>>>>>> TSviedxkyoOz18bkUElStigHmh1Fzmo%3D&reserved=0>>
>>>>>>> 
>>>>>>> 
>>>>>>> Issues resolved in this release:
>>>>>>> https://na01.safelinks.protection.outlook.com/?url=
>>>> https%3A%2F%2Fissue
>>>>>>> s.apache.org%2Fjira%2Fsecure%2FReleaseNote.jspa%
>>>> 3FprojectId%3D12315820
>>>>>>> %26version%3D12335833&data=02%7C01%7CQiuhe.Wang%40microsoft.com
>>>> %7C9623
>>>>>>> 55591ab34abfc1c908d4dadb20a8%7C72f988bf86f141af91ab2d7cd011
>>>> db47%7C1%7C
>>>>>>> 0%7C636374082064906216&sdata=ojQ%2FJYKmfIsCHdTPCkO%
>>>> 2BmWjYs3BLlxzcUKpV4
>>>>>>> CBUaSg%3D&reserved=0
>>>>>>> <https://na01.safelinks.protection.outlook.com/?url=
>>>> https%3A%2F%2Fissu
>>>>>>> es.apache.org%2Fjira%2Fsecure%2FReleaseNote.jspa%
>>>> 3FprojectId%3D1231582
>>>>>>> 0%26version%3D12335833&data=02%7C01%7CQiuhe.Wang%40microsoft.com
>>>> %7C962
>>>>>>> 355591ab34abfc1c908d4dadb20a8%7C72f988bf86f141af91ab2d7cd011
>>>> db47%7C1%7
>>>>>>> C0%7C636374082064906216&sdata=ojQ%2FJYKmfIsCHdTPCkO%
>>>> 2BmWjYs3BLlxzcUKpV
>>>>>>> 4CBUaSg%3D&reserved=0>
>>>>>>> <https://na01.safelinks.protection.outlook.com/?url=
>>>> https%3A%2F%2Fissu
>>>>>>> es.apache.org%2Fjira%2Fsecure%2FReleaseNote.jspa%
>>>> 3FprojectId%3D1231582
>>>>>>> 0%26version%3D12335833&data=02%7C01%7CQiuhe.Wang%40microsoft.com
>>>> %7C962
>>>>>>> 355591ab34abfc1c908d4dadb20a8%7C72f988bf86f141af91ab2d7cd011
>>>> db47%7C1%7
>>>>>>> C0%7C636374082064906216&sdata=ojQ%2FJYKmfIsCHdTPCkO%
>>>> 2BmWjYs3BLlxzcUKpV
>>>>>>> 4CBUaSg%3D&reserved=0
>>>>>>> <https://na01.safelinks.protection.outlook.com/?url=
>>>> https%3A%2F%2Fissu
>>>>>>> es.apache.org%2Fjira%2Fsecure%2FReleaseNote.jspa%
>>>> 3FprojectId%3D1231582
>>>>>>> 0%26version%3D12335833&data=02%7C01%7CQiuhe.Wang%40microsoft.com
>>>> %7C962
>>>>>>> 355591ab34abfc1c908d4dadb20a8%7C72f988bf86f141af91ab2d7cd011
>>>> db47%7C1%7
>>>>>>> C0%7C636374082064906216&sdata=ojQ%2FJYKmfIsCHdTPCkO%
>>>> 2BmWjYs3BLlxzcUKpV
>>>>>>> 4CBUaSg%3D&reserved=0>>
>>>>>>> 
>>>>>>> 
>>>>>>> The vote will be open for 72 hours. Please download the release
>>>>>>> candidate, check the hashes/signature, build it and test it,
and
>>>>>>> then please vote:
>>>>>>> 
>>>>>>> [ ] +1 Release this package as Apache REEF 0.16.0 [ ] +0 no
>>>>>>> opinion [ ] -1 Do not release this package because ...
>>>>>>> 
>>>>>>> Thanks!
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>>> 
>>> 
>>> --
>>> Byung-Gon Chun
>> 
> 
> 
> 
> -- 
> Byung-Gon Chun



Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message