reef-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Julia Wang (QIUHE)" <Qiuhe.W...@microsoft.com.INVALID>
Subject RE: [VOTE] Release Apache REEF 0.16.0 (rc1)
Date Thu, 10 Aug 2017 05:16:49 GMT
In the past few days, I picked up multiple commits from Jan to May and run 1000 node testing.
It is really hard to find when the issue was introduced because the issue is transient. Previous
I run 0.16 build with 1000 nodes and 800 nodes, they always fail. Now when I run it again
with 0.16 release build, it passed a few times. 

Given the fact that the root cause of transient issue is hard to identify and there is no
short term fix, and we haven't released for very long time. I would change my vote to 0 and
let it go. We will come back to debug whenever we have some capacity. 

Thanks,
Julia

-----Original Message-----
From: Byung-Gon Chun [mailto:bgchun@gmail.com] 
Sent: Wednesday, August 9, 2017 9:36 PM
To: dev@reef.apache.org
Subject: Re: [VOTE] Release Apache REEF 0.16.0 (rc1)

Thanks for testing, Julia!

I have to submit a board report soon. It'd be very nice if we have a conclusion on this release
vote. :)

Thanks.
-Gon

On Wed, Aug 9, 2017 at 3:28 AM, Julia Wang (QIUHE) < Qiuhe.Wang@microsoft.com.invalid>
wrote:

> I am doing some more testing to try to narrow down the cause of the 
> failure. I will update the thread soon after I have more findings.
>
> Julia
>
> Sent from my iPhone
>
> > On Aug 7, 2017, at 3:44 PM, Taegeon Um <taegeonum@gmail.com> wrote:
> >
> > Hi Julia,
> >
> > Are you still -1 on releasing 0.16?
> >
> > Thanks,
> > Taegeon
> >
> >
> > 2017. 8. 5. 오후 3:23에 "Julia Wang (QIUHE)" <Qiuhe.Wang@microsoft.com.
> invalid>님이
> > 작성:
> >
> > HI Gon,
> >
> > Agree, it is hard to resolve. But I am pretty sure it was introduced 
> > in
> the
> > first few month of this year. It was working last year ☹
> >
> > Julia
> >
> > -----Original Message-----
> > From: Byung-Gon Chun [mailto:bgchun@gmail.com]
> > Sent: Friday, August 4, 2017 10:53 PM
> > To: dev@reef.apache.org
> > Subject: Re: [VOTE] Release Apache REEF 0.16.0 (rc1)
> >
> > Julia, thanks for running the tests!
> > Scalability bugs are hard to debug. It's not likely that we will 
> > resolve them quickly.
> >
> > I definitely vote for option 2. Once we fix the bugs, perhaps we can 
> > do a minor version update release.
> >
> >
> >> On Fri, Aug 4, 2017 at 11:34 PM, Gyewon Lee <strayyyyyy@gmail.com>
> wrote:
> >>
> >> I agree with Taegeon. +1 for option 2.
> >>
> >> 2017-08-05 11:41 GMT+09:00 Taegeon Um <taegeonum@gmail.com>:
> >>
> >>> Thanks Julia for sharimg the issue!
> >>>
> >>> 2017. 8. 5. 오전 10:56에 "Julia Wang (QIUHE)" <Qiuhe.Wang@microsoft.com.
> >>> invalid>님이
> >>> 작성:
> >>>
> >>> TestUnhandledTaskExceptionDoesntCrashEvaluator passed in my env.
> >>>
> >>> I would think both TestUnhandledTaskExceptionDoesntCrashEvaluator
> >>> and TestRuntimeNameSpecifyingValidName are transient failures. We 
> >>> can log
> >> Jira
> >>> for following up similar as what we have agreed on other test 
> >>> transient failures.
> >>>
> >>>
> >>> +1 for followig up them as transient failures.
> >>>
> >>>
> >>> Today I tested IMRU example on Yarn.
> >>> With 500 nodes, test pass. I run multiple times, they all pass.
> >>> With 1000 nodes, test fails. Received 1000 completed tasks but 
> >>> only
> >>> 998 completed evaluators. Drive doesn’t shut down until I kill it.
> >>>
> >>> With 800 nodes, test fails. Received 800 completed tasks but only
> >>> 799 completed evaluators. Drive doesn’t shut down until I kill it.
> >>>
> >>> Options1: Find root cause and fix the issue before 0.16 release.
> >>> From the logs, there is no error. Looks like finding the root 
> >>> cause is not trivial job.  We had similar issue last year, it took 
> >>> big effort for Mariia and
> >> me
> >>> to identify the issue.
> >>> Options 2: Log JIRA and follow up later.
> >>>
> >>>
> >>> +1 for option 2.
> >>>
> >>> I think it would take a long time to investigate root cause, so 
> >>> I'm
> >> worried
> >>> that 0.16 release will be delayed for a long time again.
> >>>
> >>> How about releasing 0.16 with the known issues, and fixing them in
> 0.17?
> >> Or
> >>> if we resolve the issues quickly, we could do a minor release 
> >>> (e.g.,
> >>> 0.16.1) ?
> >>>
> >>> Taegeon
> >>>
> >>>
> >>> Julia
> >>>
> >>>
> >>> From: Kim Doyoung [mailto:disoxc21@gmail.com]
> >>> Sent: Friday, August 4, 2017 2:50 AM
> >>> To: dev@reef.apache.org
> >>> Subject: RE: [VOTE] Release Apache REEF 0.16.0 (rc1)
> >>>
> >>> Hi
> >>>
> >>> I ran test on Java and .Net.
> >>> My environment is Windows 10 x64 ver 1703 with java 1.8.0_141
> >>>
> >>> Java built and passed tests well by command ‘mvn clean install’
> >>>
> >>> But .Net side has failed to pass a test with Visual Studio 2015.
> >>> It’s not a same test failure with Julia.
> >>>
> >>> TestUnhandledTaskExceptionDoesntCrashEvaluator
> >>> [cid:image001.png@01D30D51.E782D6E0]
> >>>
> >>> `TestRuntimeNameSpecifyingValidName` Test was passed.
> >>>
> >>> Thank you.
> >>>
> >>> Doyoung
> >>>
> >>> 보낸 사람: Taegeon Um<mailto:taegeonum@gmail.com> 보낸 날짜:
2017년 8월 4일 
> >>> 금요일 오후 3:31 받는 사람: dev@reef.apache.org<mailto:dev@reef.apache.org>
> >>> 제목: Re: [VOTE] Release Apache REEF 0.16.0 (rc1)
> >>>
> >>> Thanks Julia for sharing the result!
> >>>
> >>> Is there someone who experienced the same test failure in .Net?
> >>>
> >>> Taegeon
> >>>
> >>>> On Aug 4, 2017, at 3:19 PM, Julia Wang (QIUHE) <
> >> Qiuhe.Wang@microsoft.com
> >>> .
> >>> INVALID<mailto:Qiuhe.Wang@microsoft.com.INVALID>> wrote:
> >>>>
> >>>> -Test HelloREEF from .Net to Java on YARN is successful.
> >>>> - mvn clean install pass on Windows Server 2002 R2 -.Net tests in 
> >>>> VS 2005 all passed with yarn test filtered, except one
> >>> test failure
> >>>>      TestRuntimeNameSpecifyingValidName
> >>>> The test error is cannot read log file, same as the transient 
> >>>> errors in
> >>> other tests. However, I run 3 times with this test only, all fail 
> >>> in my local box.
> >>>>
> >>>> Can someone run .Net tests in your box to see if it can repro?
> >>>>
> >>>> Julia
> >>>>
> >>>> -----Original Message-----
> >>>> From: Julia Wang (QIUHE) 
> >>>> [mailto:Qiuhe.Wang@microsoft.com.INVALID]
> >>>> Sent: Thursday, August 3, 2017 7:23 PM
> >>>> To: dev@reef.apache.org<mailto:dev@reef.apache.org>
> >>>> Subject: RE: [VOTE] Release Apache REEF 0.16.0 (rc1)
> >>>>
> >>>> Right, we need not only vote counts but test coverage. I will 
> >>>> test on
> >>> YARN.
> >>>>
> >>>> Julia
> >>>>
> >>>> -----Original Message-----
> >>>> From: Taegeon Um [mailto:taegeonum@gmail.com]
> >>>> Sent: Thursday, August 3, 2017 6:50 PM
> >>>> To: dev@reef.apache.org<mailto:dev@reef.apache.org>
> >>>> Subject: Re: [VOTE] Release Apache REEF 0.16.0 (rc1)
> >>>>
> >>>> Hi all,
> >>>>
> >>>> Thanks for running tests on various environments!
> >>>>
> >>>> We already have 6 +1 :), but it would be more great if someone 
> >>>> runs the
> >>> test on HDI and Yarn.
> >>>>
> >>>> Thanks,
> >>>> Taegeon
> >>>>
> >>>>> On Aug 4, 2017, at 3:07 AM, Sergiy Matusevych <
> >>> sergiy.matusevych@gmail.com<mailto:sergiy.matusevych@gmail.com>>
> wrote:
> >>>>>
> >>>>> Here's what I have:
> >>>>>
> >>>>> Environment 1:
> >>>>> * Windows 10 Pro 1703 build 15063.483
> >>>>> * Java HotSpot(TM) 64-Bit Server VM (build 25.144-b01, mixed
> >>>>> mode)
> >>>>> * Visual Studio 2017 Enterprise version 15.2 (26430.16)
> >>>>> * Microsoft .NET Framework version 4.7.02046
> >>>>>
> >>>>> mvn clean install
> >>>>> all tests pass
> >>>>>
> >>>>> Visual Studio the following tests fail:
> >>>>>
> >>>>>
> >>>>>
> >>>>> (all apparently related to YARN environment)
> >>>>>
> >>>>> Environment 2:
> >>>>> * Ubuntu Linux 17.04
> >>>>> * OpenJDK 64-Bit Server VM (build 25.131-b11, mixed mode)
> >>>>>
> >>>>> mvn clean install
> >>>>> all tests pass
> >>>>>
> >>>>>
> >>>>> My vote is +1
> >>>>>
> >>>>> Great job everyone!
> >>>>>
> >>>>> Cheers,
> >>>>> Sergiy.
> >>>>>
> >>>>>
> >>>>> On Tue, Aug 1, 2017 at 10:44 PM, Taegeon Um <taegeonum@gmail.com
> >>> <mailto:
> >>> taegeonum@gmail.com<mailto:taegeonum@gmail.com%20%
> >>> 3cmailto:taegeonum@gmail.com>>> wrote:
> >>>>> This is to call for a new vote for the source release of Apache

> >>>>> REEF
> >>> 0.16.0 (rc1).
> >>>>>
> >>>>> The source tar ball, including signatures, digests, etc can be 
> >>>>> found
> >> at:
> >>>>> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2
> >>>>> F
> >>>>> dist
> >> .
> >>>>> apache.org%2Frepos%2Fdist%2Fdev%2Freef%2F0.16.0-rc1%2F&
> >> data=02%7C01%7C
> >>>>> Qiuhe.Wang%40microsoft.com%7C962355591ab34abfc1c908d4dadb
> >> 20a8%7C72f988
> >>>>> bf86f141af91ab2d7cd011db47%7C1%7C0%7C636374082064906216&
> >> sdata=xjCvVqLO
> >>>>> iymatXvxN0glp9Ty9xip1fcpMint0HYMrkA%3D&reserved=0
> >>>>> <https://na01.safelinks.protection.outlook.com/?url=
> >> https%3A%2F%2Fdist
> >>>>> .apache.org%2Frepos%2Fdist%2Fdev%2Freef%2F0.16.0-rc1%2F&
> >> data=02%7C01%7
> >>>>> CQiuhe.Wang%40microsoft.com%7C962355591ab34abfc1c908d4dadb
> >> 20a8%7C72f98
> >>>>> 8bf86f141af91ab2d7cd011db47%7C1%7C0%7C636374082064906216&
> >> sdata=xjCvVqL
> >>>>> OiymatXvxN0glp9Ty9xip1fcpMint0HYMrkA%3D&reserved=0>
> >>>>> <https://na01.safelinks.protection.outlook.com/?url=
> >> https%3A%2F%2Fdist
> >>>>> .apache.org%2Frepos%2Fdist%2Fdev%2Freef%2F0.16.0-rc1%2F&
> >> data=02%7C01%7
> >>>>> CQiuhe.Wang%40microsoft.com%7C962355591ab34abfc1c908d4dadb
> >> 20a8%7C72f98
> >>>>> 8bf86f141af91ab2d7cd011db47%7C1%7C0%7C636374082064906216&
> >> sdata=xjCvVqL
> >>>>> OiymatXvxN0glp9Ty9xip1fcpMint0HYMrkA%3D&reserved=0
> >>>>> <https://na01.safelinks.protection.outlook.com/?url=
> >> https%3A%2F%2Fdist
> >>>>> .apache.org%2Frepos%2Fdist%2Fdev%2Freef%2F0.16.0-rc1%2F&
> >> data=02%7C01%7
> >>>>> CQiuhe.Wang%40microsoft.com%7C962355591ab34abfc1c908d4dadb
> >> 20a8%7C72f98
> >>>>> 8bf86f141af91ab2d7cd011db47%7C1%7C0%7C636374082064906216&
> >> sdata=xjCvVqL
> >>>>> OiymatXvxN0glp9Ty9xip1fcpMint0HYMrkA%3D&reserved=0>>
> >>>>>
> >>>>> The Git tag is release-0.16.0-rc1 The Git commit ID is
> >>>>> 85cc0a090ab48cf27acce2128c64b07b197d92e5
> >>>>>
> >>>>> Checksums of apache-reef-0.16.0-rc1.tar.gz:
> >>>>>
> >>>>> MD5: 155673fe44f95be9362b9075865c8cad
> >>>>> SHA:
> >>>>> d62c58df1f4ba962a51d81579d27321f75dad98c3c3def9bc8fb24ebf1e2
> >> 7978029d7d
> >>>>> dedf26bf4ee9434cb8e6d0e4f6e1a9a4d240d03daccd9ef66bdc403f1b
> >>>>>
> >>>>> Release artifacts are signed with a key found in the KEYS file
> >> available
> >>> here:
> >>>>>
> >>>>> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2
> >>>>> F
> >>>>> dist
> >> .
> >>>>> apache.org%2Frepos%2Fdist%2Frelease%2Freef%2FKEYS&data=
> >> 02%7C01%7CQiuhe
> >>>>> .Wang%40microsoft.com%7C962355591ab34abfc1c908d4dadb
> >> 20a8%7C72f988bf86f
> >>>>> 141af91ab2d7cd011db47%7C1%7C0%7C636374082064906216&sdata=
> >> ozT2s3kOfzDgT
> >>>>> SviedxkyoOz18bkUElStigHmh1Fzmo%3D&reserved=0
> >>>>> <https://na01.safelinks.protection.outlook.com/?url=
> >> https%3A%2F%2Fdist
> >>>>> .apache.org%2Frepos%2Fdist%2Frelease%2Freef%2FKEYS&data=
> >> 02%7C01%7CQiuh
> >>>>> e.Wang%40microsoft.com%7C962355591ab34abfc1c908d4dadb
> >> 20a8%7C72f988bf86
> >>>>> f141af91ab2d7cd011db47%7C1%7C0%7C636374082064906216&
> >> sdata=ozT2s3kOfzDg
> >>>>> TSviedxkyoOz18bkUElStigHmh1Fzmo%3D&reserved=0>
> >>>>> <https://na01.safelinks.protection.outlook.com/?url=
> >> https%3A%2F%2Fdist
> >>>>> .apache.org%2Frepos%2Fdist%2Frelease%2Freef%2FKEYS&data=
> >> 02%7C01%7CQiuh
> >>>>> e.Wang%40microsoft.com%7C962355591ab34abfc1c908d4dadb
> >> 20a8%7C72f988bf86
> >>>>> f141af91ab2d7cd011db47%7C1%7C0%7C636374082064906216&
> >> sdata=ozT2s3kOfzDg
> >>>>> TSviedxkyoOz18bkUElStigHmh1Fzmo%3D&reserved=0
> >>>>> <https://na01.safelinks.protection.outlook.com/?url=
> >> https%3A%2F%2Fdist
> >>>>> .apache.org%2Frepos%2Fdist%2Frelease%2Freef%2FKEYS&data=
> >> 02%7C01%7CQiuh
> >>>>> e.Wang%40microsoft.com%7C962355591ab34abfc1c908d4dadb
> >> 20a8%7C72f988bf86
> >>>>> f141af91ab2d7cd011db47%7C1%7C0%7C636374082064906216&
> >> sdata=ozT2s3kOfzDg
> >>>>> TSviedxkyoOz18bkUElStigHmh1Fzmo%3D&reserved=0>>
> >>>>>
> >>>>>
> >>>>> Issues resolved in this release:
> >>>>> https://na01.safelinks.protection.outlook.com/?url=
> >> https%3A%2F%2Fissue
> >>>>> s.apache.org%2Fjira%2Fsecure%2FReleaseNote.jspa%
> >> 3FprojectId%3D12315820
> >>>>> %26version%3D12335833&data=02%7C01%7CQiuhe.Wang%40microsoft.com
> >> %7C9623
> >>>>> 55591ab34abfc1c908d4dadb20a8%7C72f988bf86f141af91ab2d7cd011
> >> db47%7C1%7C
> >>>>> 0%7C636374082064906216&sdata=ojQ%2FJYKmfIsCHdTPCkO%
> >> 2BmWjYs3BLlxzcUKpV4
> >>>>> CBUaSg%3D&reserved=0
> >>>>> <https://na01.safelinks.protection.outlook.com/?url=
> >> https%3A%2F%2Fissu
> >>>>> es.apache.org%2Fjira%2Fsecure%2FReleaseNote.jspa%
> >> 3FprojectId%3D1231582
> >>>>> 0%26version%3D12335833&data=02%7C01%7CQiuhe.Wang%40microsoft.com
> >> %7C962
> >>>>> 355591ab34abfc1c908d4dadb20a8%7C72f988bf86f141af91ab2d7cd011
> >> db47%7C1%7
> >>>>> C0%7C636374082064906216&sdata=ojQ%2FJYKmfIsCHdTPCkO%
> >> 2BmWjYs3BLlxzcUKpV
> >>>>> 4CBUaSg%3D&reserved=0>
> >>>>> <https://na01.safelinks.protection.outlook.com/?url=
> >> https%3A%2F%2Fissu
> >>>>> es.apache.org%2Fjira%2Fsecure%2FReleaseNote.jspa%
> >> 3FprojectId%3D1231582
> >>>>> 0%26version%3D12335833&data=02%7C01%7CQiuhe.Wang%40microsoft.com
> >> %7C962
> >>>>> 355591ab34abfc1c908d4dadb20a8%7C72f988bf86f141af91ab2d7cd011
> >> db47%7C1%7
> >>>>> C0%7C636374082064906216&sdata=ojQ%2FJYKmfIsCHdTPCkO%
> >> 2BmWjYs3BLlxzcUKpV
> >>>>> 4CBUaSg%3D&reserved=0
> >>>>> <https://na01.safelinks.protection.outlook.com/?url=
> >> https%3A%2F%2Fissu
> >>>>> es.apache.org%2Fjira%2Fsecure%2FReleaseNote.jspa%
> >> 3FprojectId%3D1231582
> >>>>> 0%26version%3D12335833&data=02%7C01%7CQiuhe.Wang%40microsoft.com
> >> %7C962
> >>>>> 355591ab34abfc1c908d4dadb20a8%7C72f988bf86f141af91ab2d7cd011
> >> db47%7C1%7
> >>>>> C0%7C636374082064906216&sdata=ojQ%2FJYKmfIsCHdTPCkO%
> >> 2BmWjYs3BLlxzcUKpV
> >>>>> 4CBUaSg%3D&reserved=0>>
> >>>>>
> >>>>>
> >>>>> The vote will be open for 72 hours. Please download the release

> >>>>> candidate, check the hashes/signature, build it and test it, and

> >>>>> then please vote:
> >>>>>
> >>>>> [ ] +1 Release this package as Apache REEF 0.16.0 [ ] +0 no 
> >>>>> opinion [ ] -1 Do not release this package because ...
> >>>>>
> >>>>> Thanks!
> >>>>>
> >>>>
> >>>
> >>
> >
> >
> >
> > --
> > Byung-Gon Chun
>



--
Byung-Gon Chun
Mime
View raw message