reef-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Taegeon Um <taegeo...@gmail.com>
Subject Re: [VOTE] Release Apache REEF 0.16.0 (rc1)
Date Sat, 05 Aug 2017 12:19:31 GMT
 Hi all, 

72 hours passed, but I will open this vote until we reach a consensus. 

Is there someone who think we need to block 0.16 release until we resolve the scalability
issue?

If not, do you think it is okay to release 0.16 and resolve the issue in 0.17 or 0.16.1? 
 

Thanks,
Taegeon

> On Aug 5, 2017, at 3:31 PM, Byung-Gon Chun <bgchun@gmail.com> wrote:
> 
> Thanks for the info!
> 
> Unless we have a testing infrastructure for large-scale experiments, it is hard to discover
such bugs early. 
> 
> This can be another hackathon topic. :)
> 
> -Gon
> 
> Sent from my iPhone
> 
> 2017. 8. 4. 오후 11:23 Julia Wang (QIUHE) <Qiuhe.Wang@microsoft.com.INVALID>
작성:
> 
>> HI Gon,
>> 
>> Agree, it is hard to resolve. But I am pretty sure it was introduced in the first
few month of this year. It was working last year ☹
>> 
>> Julia
>> 
>> -----Original Message-----
>> From: Byung-Gon Chun [mailto:bgchun@gmail.com] 
>> Sent: Friday, August 4, 2017 10:53 PM
>> To: dev@reef.apache.org
>> Subject: Re: [VOTE] Release Apache REEF 0.16.0 (rc1)
>> 
>> Julia, thanks for running the tests!
>> Scalability bugs are hard to debug. It's not likely that we will resolve them quickly.
>> 
>> I definitely vote for option 2. Once we fix the bugs, perhaps we can do a minor version
update release.
>> 
>> 
>>> On Fri, Aug 4, 2017 at 11:34 PM, Gyewon Lee <strayyyyyy@gmail.com> wrote:
>>> 
>>> I agree with Taegeon. +1 for option 2.
>>> 
>>> 2017-08-05 11:41 GMT+09:00 Taegeon Um <taegeonum@gmail.com>:
>>> 
>>>> Thanks Julia for sharimg the issue!
>>>> 
>>>> 2017. 8. 5. 오전 10:56에 "Julia Wang (QIUHE)" <Qiuhe.Wang@microsoft.com.
>>>> invalid>님이
>>>> 작성:
>>>> 
>>>> TestUnhandledTaskExceptionDoesntCrashEvaluator passed in my env.
>>>> 
>>>> I would think both TestUnhandledTaskExceptionDoesntCrashEvaluator 
>>>> and TestRuntimeNameSpecifyingValidName are transient failures. We 
>>>> can log
>>> Jira
>>>> for following up similar as what we have agreed on other test 
>>>> transient failures.
>>>> 
>>>> 
>>>> +1 for followig up them as transient failures.
>>>> 
>>>> 
>>>> Today I tested IMRU example on Yarn.
>>>> With 500 nodes, test pass. I run multiple times, they all pass.
>>>> With 1000 nodes, test fails. Received 1000 completed tasks but only 
>>>> 998 completed evaluators. Drive doesn’t shut down until I kill it.
>>>> 
>>>> With 800 nodes, test fails. Received 800 completed tasks but only 
>>>> 799 completed evaluators. Drive doesn’t shut down until I kill it.
>>>> 
>>>> Options1: Find root cause and fix the issue before 0.16 release. 
>>>> From the logs, there is no error. Looks like finding the root cause 
>>>> is not trivial job.  We had similar issue last year, it took big 
>>>> effort for Mariia and
>>> me
>>>> to identify the issue.
>>>> Options 2: Log JIRA and follow up later.
>>>> 
>>>> 
>>>> +1 for option 2.
>>>> 
>>>> I think it would take a long time to investigate root cause, so I'm
>>> worried
>>>> that 0.16 release will be delayed for a long time again.
>>>> 
>>>> How about releasing 0.16 with the known issues, and fixing them in 0.17?
>>> Or
>>>> if we resolve the issues quickly, we could do a minor release (e.g.,
>>>> 0.16.1) ?
>>>> 
>>>> Taegeon
>>>> 
>>>> 
>>>> Julia
>>>> 
>>>> 
>>>> From: Kim Doyoung [mailto:disoxc21@gmail.com]
>>>> Sent: Friday, August 4, 2017 2:50 AM
>>>> To: dev@reef.apache.org
>>>> Subject: RE: [VOTE] Release Apache REEF 0.16.0 (rc1)
>>>> 
>>>> Hi
>>>> 
>>>> I ran test on Java and .Net.
>>>> My environment is Windows 10 x64 ver 1703 with java 1.8.0_141
>>>> 
>>>> Java built and passed tests well by command ‘mvn clean install’
>>>> 
>>>> But .Net side has failed to pass a test with Visual Studio 2015.
>>>> It’s not a same test failure with Julia.
>>>> 
>>>> TestUnhandledTaskExceptionDoesntCrashEvaluator
>>>> [cid:image001.png@01D30D51.E782D6E0]
>>>> 
>>>> `TestRuntimeNameSpecifyingValidName` Test was passed.
>>>> 
>>>> Thank you.
>>>> 
>>>> Doyoung
>>>> 
>>>> 보낸 사람: Taegeon Um<mailto:taegeonum@gmail.com> 보낸 날짜:
2017년 8월 4일 금요일 
>>>> 오후 3:31 받는 사람: dev@reef.apache.org<mailto:dev@reef.apache.org>
>>>> 제목: Re: [VOTE] Release Apache REEF 0.16.0 (rc1)
>>>> 
>>>> Thanks Julia for sharing the result!
>>>> 
>>>> Is there someone who experienced the same test failure in .Net?
>>>> 
>>>> Taegeon
>>>> 
>>>>> On Aug 4, 2017, at 3:19 PM, Julia Wang (QIUHE) <
>>> Qiuhe.Wang@microsoft.com
>>>> .
>>>> INVALID<mailto:Qiuhe.Wang@microsoft.com.INVALID>> wrote:
>>>>> 
>>>>> -Test HelloREEF from .Net to Java on YARN is successful.
>>>>> - mvn clean install pass on Windows Server 2002 R2 -.Net tests in 
>>>>> VS 2005 all passed with yarn test filtered, except one
>>>> test failure
>>>>>     TestRuntimeNameSpecifyingValidName
>>>>> The test error is cannot read log file, same as the transient 
>>>>> errors in
>>>> other tests. However, I run 3 times with this test only, all fail in 
>>>> my local box.
>>>>> 
>>>>> Can someone run .Net tests in your box to see if it can repro?
>>>>> 
>>>>> Julia
>>>>> 
>>>>> -----Original Message-----
>>>>> From: Julia Wang (QIUHE) [mailto:Qiuhe.Wang@microsoft.com.INVALID]
>>>>> Sent: Thursday, August 3, 2017 7:23 PM
>>>>> To: dev@reef.apache.org<mailto:dev@reef.apache.org>
>>>>> Subject: RE: [VOTE] Release Apache REEF 0.16.0 (rc1)
>>>>> 
>>>>> Right, we need not only vote counts but test coverage. I will test 
>>>>> on
>>>> YARN.
>>>>> 
>>>>> Julia
>>>>> 
>>>>> -----Original Message-----
>>>>> From: Taegeon Um [mailto:taegeonum@gmail.com]
>>>>> Sent: Thursday, August 3, 2017 6:50 PM
>>>>> To: dev@reef.apache.org<mailto:dev@reef.apache.org>
>>>>> Subject: Re: [VOTE] Release Apache REEF 0.16.0 (rc1)
>>>>> 
>>>>> Hi all,
>>>>> 
>>>>> Thanks for running tests on various environments!
>>>>> 
>>>>> We already have 6 +1 :), but it would be more great if someone 
>>>>> runs the
>>>> test on HDI and Yarn.
>>>>> 
>>>>> Thanks,
>>>>> Taegeon
>>>>> 
>>>>>> On Aug 4, 2017, at 3:07 AM, Sergiy Matusevych <
>>>> sergiy.matusevych@gmail.com<mailto:sergiy.matusevych@gmail.com>>
wrote:
>>>>>> 
>>>>>> Here's what I have:
>>>>>> 
>>>>>> Environment 1:
>>>>>> * Windows 10 Pro 1703 build 15063.483
>>>>>> * Java HotSpot(TM) 64-Bit Server VM (build 25.144-b01, mixed 
>>>>>> mode)
>>>>>> * Visual Studio 2017 Enterprise version 15.2 (26430.16)
>>>>>> * Microsoft .NET Framework version 4.7.02046
>>>>>> 
>>>>>> mvn clean install
>>>>>> all tests pass
>>>>>> 
>>>>>> Visual Studio the following tests fail:
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> (all apparently related to YARN environment)
>>>>>> 
>>>>>> Environment 2:
>>>>>> * Ubuntu Linux 17.04
>>>>>> * OpenJDK 64-Bit Server VM (build 25.131-b11, mixed mode)
>>>>>> 
>>>>>> mvn clean install
>>>>>> all tests pass
>>>>>> 
>>>>>> 
>>>>>> My vote is +1
>>>>>> 
>>>>>> Great job everyone!
>>>>>> 
>>>>>> Cheers,
>>>>>> Sergiy.
>>>>>> 
>>>>>> 
>>>>>> On Tue, Aug 1, 2017 at 10:44 PM, Taegeon Um <taegeonum@gmail.com
>>>> <mailto:
>>>> taegeonum@gmail.com<mailto:taegeonum@gmail.com%20%
>>>> 3cmailto:taegeonum@gmail.com>>> wrote:
>>>>>> This is to call for a new vote for the source release of Apache 
>>>>>> REEF
>>>> 0.16.0 (rc1).
>>>>>> 
>>>>>> The source tar ball, including signatures, digests, etc can be 
>>>>>> found
>>> at:
>>>>>> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2F
>>>>>> dist
>>> .
>>>>>> apache.org%2Frepos%2Fdist%2Fdev%2Freef%2F0.16.0-rc1%2F&
>>> data=02%7C01%7C
>>>>>> Qiuhe.Wang%40microsoft.com%7C962355591ab34abfc1c908d4dadb
>>> 20a8%7C72f988
>>>>>> bf86f141af91ab2d7cd011db47%7C1%7C0%7C636374082064906216&
>>> sdata=xjCvVqLO
>>>>>> iymatXvxN0glp9Ty9xip1fcpMint0HYMrkA%3D&reserved=0
>>>>>> <https://na01.safelinks.protection.outlook.com/?url=
>>> https%3A%2F%2Fdist
>>>>>> .apache.org%2Frepos%2Fdist%2Fdev%2Freef%2F0.16.0-rc1%2F&
>>> data=02%7C01%7
>>>>>> CQiuhe.Wang%40microsoft.com%7C962355591ab34abfc1c908d4dadb
>>> 20a8%7C72f98
>>>>>> 8bf86f141af91ab2d7cd011db47%7C1%7C0%7C636374082064906216&
>>> sdata=xjCvVqL
>>>>>> OiymatXvxN0glp9Ty9xip1fcpMint0HYMrkA%3D&reserved=0>
>>>>>> <https://na01.safelinks.protection.outlook.com/?url=
>>> https%3A%2F%2Fdist
>>>>>> .apache.org%2Frepos%2Fdist%2Fdev%2Freef%2F0.16.0-rc1%2F&
>>> data=02%7C01%7
>>>>>> CQiuhe.Wang%40microsoft.com%7C962355591ab34abfc1c908d4dadb
>>> 20a8%7C72f98
>>>>>> 8bf86f141af91ab2d7cd011db47%7C1%7C0%7C636374082064906216&
>>> sdata=xjCvVqL
>>>>>> OiymatXvxN0glp9Ty9xip1fcpMint0HYMrkA%3D&reserved=0
>>>>>> <https://na01.safelinks.protection.outlook.com/?url=
>>> https%3A%2F%2Fdist
>>>>>> .apache.org%2Frepos%2Fdist%2Fdev%2Freef%2F0.16.0-rc1%2F&
>>> data=02%7C01%7
>>>>>> CQiuhe.Wang%40microsoft.com%7C962355591ab34abfc1c908d4dadb
>>> 20a8%7C72f98
>>>>>> 8bf86f141af91ab2d7cd011db47%7C1%7C0%7C636374082064906216&
>>> sdata=xjCvVqL
>>>>>> OiymatXvxN0glp9Ty9xip1fcpMint0HYMrkA%3D&reserved=0>>
>>>>>> 
>>>>>> The Git tag is release-0.16.0-rc1 The Git commit ID is 
>>>>>> 85cc0a090ab48cf27acce2128c64b07b197d92e5
>>>>>> 
>>>>>> Checksums of apache-reef-0.16.0-rc1.tar.gz:
>>>>>> 
>>>>>> MD5: 155673fe44f95be9362b9075865c8cad
>>>>>> SHA:
>>>>>> d62c58df1f4ba962a51d81579d27321f75dad98c3c3def9bc8fb24ebf1e2
>>> 7978029d7d
>>>>>> dedf26bf4ee9434cb8e6d0e4f6e1a9a4d240d03daccd9ef66bdc403f1b
>>>>>> 
>>>>>> Release artifacts are signed with a key found in the KEYS file
>>> available
>>>> here:
>>>>>> 
>>>>>> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2F
>>>>>> dist
>>> .
>>>>>> apache.org%2Frepos%2Fdist%2Frelease%2Freef%2FKEYS&data=
>>> 02%7C01%7CQiuhe
>>>>>> .Wang%40microsoft.com%7C962355591ab34abfc1c908d4dadb
>>> 20a8%7C72f988bf86f
>>>>>> 141af91ab2d7cd011db47%7C1%7C0%7C636374082064906216&sdata=
>>> ozT2s3kOfzDgT
>>>>>> SviedxkyoOz18bkUElStigHmh1Fzmo%3D&reserved=0
>>>>>> <https://na01.safelinks.protection.outlook.com/?url=
>>> https%3A%2F%2Fdist
>>>>>> .apache.org%2Frepos%2Fdist%2Frelease%2Freef%2FKEYS&data=
>>> 02%7C01%7CQiuh
>>>>>> e.Wang%40microsoft.com%7C962355591ab34abfc1c908d4dadb
>>> 20a8%7C72f988bf86
>>>>>> f141af91ab2d7cd011db47%7C1%7C0%7C636374082064906216&
>>> sdata=ozT2s3kOfzDg
>>>>>> TSviedxkyoOz18bkUElStigHmh1Fzmo%3D&reserved=0>
>>>>>> <https://na01.safelinks.protection.outlook.com/?url=
>>> https%3A%2F%2Fdist
>>>>>> .apache.org%2Frepos%2Fdist%2Frelease%2Freef%2FKEYS&data=
>>> 02%7C01%7CQiuh
>>>>>> e.Wang%40microsoft.com%7C962355591ab34abfc1c908d4dadb
>>> 20a8%7C72f988bf86
>>>>>> f141af91ab2d7cd011db47%7C1%7C0%7C636374082064906216&
>>> sdata=ozT2s3kOfzDg
>>>>>> TSviedxkyoOz18bkUElStigHmh1Fzmo%3D&reserved=0
>>>>>> <https://na01.safelinks.protection.outlook.com/?url=
>>> https%3A%2F%2Fdist
>>>>>> .apache.org%2Frepos%2Fdist%2Frelease%2Freef%2FKEYS&data=
>>> 02%7C01%7CQiuh
>>>>>> e.Wang%40microsoft.com%7C962355591ab34abfc1c908d4dadb
>>> 20a8%7C72f988bf86
>>>>>> f141af91ab2d7cd011db47%7C1%7C0%7C636374082064906216&
>>> sdata=ozT2s3kOfzDg
>>>>>> TSviedxkyoOz18bkUElStigHmh1Fzmo%3D&reserved=0>>
>>>>>> 
>>>>>> 
>>>>>> Issues resolved in this release:
>>>>>> https://na01.safelinks.protection.outlook.com/?url=
>>> https%3A%2F%2Fissue
>>>>>> s.apache.org%2Fjira%2Fsecure%2FReleaseNote.jspa%
>>> 3FprojectId%3D12315820
>>>>>> %26version%3D12335833&data=02%7C01%7CQiuhe.Wang%40microsoft.com
>>> %7C9623
>>>>>> 55591ab34abfc1c908d4dadb20a8%7C72f988bf86f141af91ab2d7cd011
>>> db47%7C1%7C
>>>>>> 0%7C636374082064906216&sdata=ojQ%2FJYKmfIsCHdTPCkO%
>>> 2BmWjYs3BLlxzcUKpV4
>>>>>> CBUaSg%3D&reserved=0
>>>>>> <https://na01.safelinks.protection.outlook.com/?url=
>>> https%3A%2F%2Fissu
>>>>>> es.apache.org%2Fjira%2Fsecure%2FReleaseNote.jspa%
>>> 3FprojectId%3D1231582
>>>>>> 0%26version%3D12335833&data=02%7C01%7CQiuhe.Wang%40microsoft.com
>>> %7C962
>>>>>> 355591ab34abfc1c908d4dadb20a8%7C72f988bf86f141af91ab2d7cd011
>>> db47%7C1%7
>>>>>> C0%7C636374082064906216&sdata=ojQ%2FJYKmfIsCHdTPCkO%
>>> 2BmWjYs3BLlxzcUKpV
>>>>>> 4CBUaSg%3D&reserved=0>
>>>>>> <https://na01.safelinks.protection.outlook.com/?url=
>>> https%3A%2F%2Fissu
>>>>>> es.apache.org%2Fjira%2Fsecure%2FReleaseNote.jspa%
>>> 3FprojectId%3D1231582
>>>>>> 0%26version%3D12335833&data=02%7C01%7CQiuhe.Wang%40microsoft.com
>>> %7C962
>>>>>> 355591ab34abfc1c908d4dadb20a8%7C72f988bf86f141af91ab2d7cd011
>>> db47%7C1%7
>>>>>> C0%7C636374082064906216&sdata=ojQ%2FJYKmfIsCHdTPCkO%
>>> 2BmWjYs3BLlxzcUKpV
>>>>>> 4CBUaSg%3D&reserved=0
>>>>>> <https://na01.safelinks.protection.outlook.com/?url=
>>> https%3A%2F%2Fissu
>>>>>> es.apache.org%2Fjira%2Fsecure%2FReleaseNote.jspa%
>>> 3FprojectId%3D1231582
>>>>>> 0%26version%3D12335833&data=02%7C01%7CQiuhe.Wang%40microsoft.com
>>> %7C962
>>>>>> 355591ab34abfc1c908d4dadb20a8%7C72f988bf86f141af91ab2d7cd011
>>> db47%7C1%7
>>>>>> C0%7C636374082064906216&sdata=ojQ%2FJYKmfIsCHdTPCkO%
>>> 2BmWjYs3BLlxzcUKpV
>>>>>> 4CBUaSg%3D&reserved=0>>
>>>>>> 
>>>>>> 
>>>>>> The vote will be open for 72 hours. Please download the release 
>>>>>> candidate, check the hashes/signature, build it and test it, and

>>>>>> then please vote:
>>>>>> 
>>>>>> [ ] +1 Release this package as Apache REEF 0.16.0 [ ] +0 no 
>>>>>> opinion [ ] -1 Do not release this package because ...
>>>>>> 
>>>>>> Thanks!
>>>>>> 
>>>>> 
>>>> 
>>> 
>> 
>> 
>> 
>> --
>> Byung-Gon Chun


Mime
View raw message