reef-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tae-Geon Um <taegeo...@gmail.com>
Subject Re: 0.16 release plan
Date Thu, 11 May 2017 02:23:54 GMT
Hi, 

Thanks Sergiy for taking a look at them! 
As far as I know, ApacheCon opens next week (May 16-18), so I think we need to resolve the
issues until the end of this week. 

I think I can help you in investigating REEF-1770. However, I’m not sure I can fix it until
the end of this week.
REEF-1770 is a transient failure, but there is no transient failure during the past 30 days
in our Java-side Travis CI (0 transient failure out of 69 builds). 
So, maybe it could be hard to reproduce it. 

Sergiy, do you think you can resolve REEF-1796 until the end of this week? 
If not, we have two options.

1) release 0.16 during the week of ApacheCon without fixing them (Actually, the .NET side
CI is unstable, but just release 0.16) 
2) do not release 0.16 until the bugs (in addition to the .NET side CI failures) are resolved


What do you guys think? 

Thanks,
Taegeon

> On May 11, 2017, at 6:55 AM, Sergiy Matusevych <sergiy.matusevych@gmail.com> wrote:
> 
> Hi guys,
> 
> It surely would be great to announce 0.16 at the conference, and we have
> some awesome features to brag about - most notably, REEF-on-Spark. Still,
> there are a few bugs that we need to fix before the release. I am mostly
> concerned with https://issues.apache.org/jira/browse/REEF-1770 and
> especially the https://issues.apache.org/jira/browse/REEF-1796 I am looking
> at them now, but any help would be greatly appreciated!
> 
> Cheers,
> Sergiy.
> 
> On Tue, May 9, 2017 at 11:55 PM, Byung-Gon Chun <bgchun@gmail.com> wrote:
> 
>> Any update?
>> It'd be great if we can release 0.16 during the week of ApacheCon.
>> 
>> -Gon
>> 
>> On Tue, Apr 11, 2017 at 10:27 AM, Tae-Geon Um <taegeonum@gmail.com> wrote:
>> 
>>> Unfortunately, we’ve also got a recent build failure in Java side [1],
>>> which is not reported previously.
>>> I’ve created an issue [2] to track this failure, and am going to
>>> investigate it.
>>> 
>>> Thanks,
>>> Taegeon
>>> 
>>> [1]: https://travis-ci.org/apache/reef/builds/220731026 <
>>> https://travis-ci.org/apache/reef/builds/220731026>
>>> [2]: https://issues.apache.org/jira/browse/REEF-1770 <
>>> https://issues.apache.org/jira/browse/REEF-1770>
>>>> On Apr 6, 2017, at 3:13 AM, Mariia Mykhailova <mamykhai@microsoft.com.
>> INVALID>
>>> wrote:
>>>> 
>>>> At least 3 of the issues previously reported under REEF-1462 have
>>> re-occurred in the past two days (I've reopened corresponding JIRAs and
>>> attached links to failures). Unfortunately, with the transient failures
>>> like these one good build is insufficient.
>>>> 
>>>> It is a known issue, since we're using free access to AppVeyor, our
>>> builds are sequential and low-priority, so sometimes when a lot of pull
>>> requests have to be built the build queue takes a while to drain.
>>>> 
>>>> -Mariia
>>>> 
>>>> -----Original Message-----
>>>> From: Byung-Gon Chun [mailto:bgchun@gmail.com]
>>>> Sent: Wednesday, April 5, 2017 12:11 AM
>>>> To: dev@reef.apache.org
>>>> Subject: Re: 0.16 release plan
>>>> 
>>>> Awesome!
>>>> 
>>>> There is no build failure in the .Net side with the latest build [1].
>>>> 
>>>> It looks like Appveyor's quite slow. Regarding PR1284 [2], Travis CI's
>>> already done. We're still waiting for Appveyor. :(
>>>> 
>>>> [1]
>>>> https://na01.safelinks.protection.outlook.com/?url=
>>> https%3A%2F%2Fci.appveyor.com%2Fproject%2FApacheSoftwareFoundation%
>>> 2Freef%2Fbuild%2F1455-master&data=02%7C01%7Cmamykhai%40microsoft.com%
>>> 7C90c5159366d44f7efdd308d47bf2fda1%7C72f988bf86f141af91ab2d7cd011
>>> db47%7C1%7C0%7C636269730940876660&sdata=j2h9%2BhaBnkHjFnkwxLh6GiPubCBDb%
>>> 2B5%2B3S8Ok6aU2dc%3D&reserved=0
>>>> [2] https://na01.safelinks.protection.outlook.com/?url=
>>> https%3A%2F%2Fgithub.com%2Fapache%2Freef%2Fpull%2F1284&
>>> data=02%7C01%7Cmamykhai%40microsoft.com%7C90c5159366d44f7efdd308d47bf2
>>> fda1%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%
>>> 7C636269730940876660&sdata=oF%2F6YP9JkpD%2FahyiP9Yu6MpIrOEd48MEVQQ8w1Iw
>>> E9Q%3D&reserved=0
>>>> 
>>>> 
>>>> On Tue, Apr 4, 2017 at 10:29 AM, Tae-Geon Um <taegeonum@gmail.com>
>>> wrote:
>>>> 
>>>>> Thanks Julia for the work!
>>>>> 
>>>>> It looks like Java and .NET builds are almost stable, except for the
>>>>> recent build failure in .NET side [1].
>>>>> As Julia said in REEF-1406 [2], we would need to wait for time if this
>>>>> failure is reproduced or not.
>>>>> 
>>>>> I will wait for a week and call a release vote if there are no build
>>>>> failures during that time.
>>>>> Thanks!
>>>>> 
>>>>> Taegeon
>>>>> 
>>>>> [1]:
>>>>> https://na01.safelinks.protection.outlook.com/?url=
>> https%3A%2F%2Fci.ap
>>>>> pveyor.com%2Fproject%2FApacheSoftwareFoundation%
>> 2Freef%2F&data=02%7C01
>>>>> %7Cmamykhai%40microsoft.com%7C90c5159366d44f7efdd308d47bf2
>> fda1%7C72f98
>>>>> 8bf86f141af91ab2d7cd011db47%7C1%7C0%7C636269730940876660&
>> sdata=Y9iO%2B
>>>>> YRbPNirr38T%2BNJtxEQg0xm65lOb0P%2Bc5w6agYI%3D&reserved=0
>>>>> build/1453-master
>>>>> <https://na01.safelinks.protection.outlook.com/?url=
>> https%3A%2F%2Fci.a
>>>>> ppveyor.com%2Fproject%2F&data=02%7C01%7Cmamykhai%40microsoft.com
>> %7C90c
>>>>> 5159366d44f7efdd308d47bf2fda1%7C72f988bf86f141af91ab2d7cd011
>> db47%7C1%7
>>>>> C0%7C636269730940876660&sdata=exulSYnqM0PxkRJBTpxAd825tbhRnt
>> M6avrry5nk
>>>>> Nfw%3D&reserved=0 ApacheSoftwareFoundation/reef/build/1453-master>
>>>>> [2]:
>>>>> https://na01.safelinks.protection.outlook.com/?url=
>> https%3A%2F%2Fissue
>>>>> s.apache.org%2Fjira%2Fbrowse%2FREEF-1406&data=02%7C01%
>> 7Cmamykhai%40mic
>>>>> rosoft.com%7C90c5159366d44f7efdd308d47bf2
>> fda1%7C72f988bf86f141af91ab2d
>>>>> 7cd011db47%7C1%7C0%7C636269730940876660&sdata=oS%
>> 2F9yenZoGqe%2FkowHza7
>>>>> m2T531qmGySb7q1qGmX%2FTJA%3D&reserved=0 <
>>>>> https://na01.safelinks.protection.outlook.com/?url=
>> https%3A%2F%2Fissue
>>>>> s.apache.org%2Fjira%2Fbrowse%2FREEF-1406&data=02%7C01%
>> 7Cmamykhai%40mic
>>>>> rosoft.com%7C90c5159366d44f7efdd308d47bf2
>> fda1%7C72f988bf86f141af91ab2d
>>>>> 7cd011db47%7C1%7C0%7C636269730940876660&sdata=oS%
>> 2F9yenZoGqe%2FkowHza7
>>>>> m2T531qmGySb7q1qGmX%2FTJA%3D&reserved=0>
>>>>> 
>>>>>> On Mar 30, 2017, at 10:28 AM, Julia Wang (QIUHE) <
>>>>> Qiuhe.Wang@microsoft.com.INVALID> wrote:
>>>>>> 
>>>>>> I have resolved all the .Net test issues for now. The fixes contain
>>>>>> what
>>>>> I have identifies so far based on the failures.
>>>>>> 
>>>>>> I agree with Marria, as they are transit failures, also they failed
>>>>>> for
>>>>> multiple reasons sometimes, we need to continue to observe if the
>>>>> issues come back again.
>>>>>> 
>>>>>> Thanks,
>>>>>> Julia
>>>>>> 
>>>>>> -----Original Message-----
>>>>>> From: Tae-Geon Um [mailto:taegeonum@gmail.com]
>>>>>> Sent: Thursday, March 23, 2017 5:20 PM
>>>>>> To: dev@reef.apache.org
>>>>>> Subject: Re: 0.16 release plan
>>>>>> 
>>>>>> Thanks Mariia for pointing it out to me.
>>>>>> Yes. I agree that we need more time to fix all of the transient
>>> failures.
>>>>>> After they are resolved, I will wait for some time to ensure that
>>>>>> they
>>>>> are not reoccurred.
>>>>>> 
>>>>>> Thanks!
>>>>>> Taegeon
>>>>>> 
>>>>>>> On Mar 24, 2017, at 2:54 AM, Mariia Mykhailova
>>>>>>> <mamykhai@microsoft.com.INVALID>
>>>>> wrote:
>>>>>>> 
>>>>>>> Please note that due to the transient nature of .NET failures,
it
>>>>>>> makes
>>>>> sense to wait for some time and to observe whether they are actually
>>>>> fixed or just lying low until the next reoccurrence. We had to reopen
>>>>> some bugs which looked resolved in the past but then reoccurred.
>>>>>>> 
>>>>>>> 
>>>>>>> -Mariia
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> ________________________________
>>>>>>> From: Tae-Geon Um <taegeonum@gmail.com>
>>>>>>> Sent: Thursday, March 23, 2017 6:51:24 AM
>>>>>>> To: dev@reef.apache.org
>>>>>>> Subject: Re: 0.16 release plan
>>>>>>> 
>>>>>>> Hi,
>>>>>>> 
>>>>>>> Julia has been doing a great work to resolve the .NET side issues.
>>>>>>> It looks like she has resolved 3 issues recently (and now 3 issues
>>>>> remain in .NET side with 1 pending PR).
>>>>>>> 
>>>>>>> Sergiy and I also have worked for the java side issues, and we've
>>>>> resolved 1 issue (and 1 issue still remains with 1 pending PR).
>>>>>>> 
>>>>>>> Because of the unresolved issues (3 .NET side and 1 java side),
I
>>>>>>> think
>>>>> it would be good to delay the release vote.
>>>>>>> However, judging from the progress we made, I think all of the
>>>>>>> issues
>>>>> could be resolved until at the end of this week or begging of next
>> week.
>>>>>>> 
>>>>>>> I will call a vote as soon as possible after they are resolved.
>>>>>>> Thanks!
>>>>>>> 
>>>>>>> Taegeon
>>>>>>> 
>>>>>>>> On Mar 23, 2017, at 5:47 PM, Byung-Gon Chun <bgchun@gmail.com>
>>> wrote:
>>>>>>>> 
>>>>>>>> Thank you for all the efforts to make release 0.16 happen!
>>>>>>>> 
>>>>>>>> Taegeon, could you give us status update? Thanks.
>>>>>>>> 
>>>>>>>> On Sat, Mar 18, 2017 at 10:01 AM, Byung-Gon Chun
>>>>>>>> <bgchun@gmail.com>
>>>>> wrote:
>>>>>>>> 
>>>>>>>>> Thanks for looking at .Net CI failures, Julia!
>>>>>>>>> 
>>>>>>>>> Thanks for handling Java CI failures, Sergiy and Taegeon!
>>>>>>>>> 
>>>>>>>>> On Fri, Mar 17, 2017 at 11:20 AM, Julia Wang (QIUHE)
<
>>>>>>>>> Qiuhe.Wang@microsoft.com.invalid> wrote:
>>>>>>>>> 
>>>>>>>>>> I am working on some of the .Net AppVeyor test failures
now.
>>>>>>>>>> 
>>>>>>>>>> -----Original Message-----
>>>>>>>>>> From: Sergiy Matusevych [mailto:sergiy.matusevych@gmail.com]
>>>>>>>>>> Sent: Wednesday, March 15, 2017 5:12 PM
>>>>>>>>>> To: dev@reef.apache.org
>>>>>>>>>> Subject: Re: 0.16 release plan
>>>>>>>>>> 
>>>>>>>>>> On Wed, Mar 15, 2017 at 4:21 PM, Tae-Geon Um
>>>>>>>>>> <taegeonum@gmail.com>
>>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>> Oh, never mind.
>>>>>>>>>>> I thought that we still need some time to make
sure that
>>>>>>>>>>> Unmanaged AM works properly on Hadoop 2.7.3 :)
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> Great, then we have only two items left! :-) I can
confirm that
>>>>>>>>>> Unmanaged AM works on proper version of YARN; still,
we need to
>>>>>>>>>> address the Java issues (that is, item #1) as they
are related
>>>>>>>>>> to the Unmanaged AM mode. For example, we must make
sure close
>>>>>>>>>> all threads before exiting REEF Driver - otherwise,
>>>>>>>>>> HelloREEFYarnUnmanagedAM example can hang as it does
not
>>>>>>>>>> currently
>>>>> have a System.exit() call at the end.
>>>>>>>>>> 
>>>>>>>>>> Thanks for help!
>>>>>>>>>> Sergiy.
>>>>>>>>>> 
>>>>>>>>>>> On Mar 16, 2017, at 6:07 AM, Sergiy Matusevych
<
>>>>>>>>>>> sergiy.matusevych@gmail.com> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>> Hi Taegeon,
>>>>>>>>>>>> 
>>>>>>>>>>>> What exactly do you mean by #3? We have a
HelloREEF example
>>>>>>>>>>>> running in Unmanaged AM mode (see HelloREEFYarnUnmanagedAM
>>>>>>>>>>>> class), and it works fine on YARN 2.7.3.
We also have several
>>>>>>>>>>>> examples and unit tests that check
>>>>>>>>>>> the
>>>>>>>>>>>> Unmanaged AM and REEF-as-a-library functionality,
e.g.
>>>>>>>>>>>> HelloREEFEnvironment, ReefOnReefDriver,
>>>>>>>>>>>> REEFEnvironmentDriverTest, and such. What
else do you think we
>>>>>>>>>>>> should unit test? (I am saying that our unit
tests are
>>>>>>>>>>>> comprehensive (they are not!), but I would
love to know
>>>>>>>>>>> what
>>>>>>>>>>>> area you think we should focus on for 0.16
release)
>>>>>>>>>>>> 
>>>>>>>>>>>> Cheers,
>>>>>>>>>>>> Sergiy.
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> On Wed, Mar 15, 2017 at 7:10 AM, Tae-Geon
Um
>>>>>>>>>>>> <taegeonum@gmail.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>> 
>>>>>>>>>>>>> It's been about 10 months since we've
released the latest
>>>>>>>>>>>>> version
>>>>>>>>>>>>> (0.15
>>>>>>>>>>>> version).
>>>>>>>>>>>>> In order not to delay the release any
longer, I want to call
>>>>>>>>>>>>> a release
>>>>>>>>>>>> vote as soon as possible.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Do you think it is ok for me to call
a 0.16 release vote on
>>>>>>>>>>>>> next
>>>>>>>>>>> Thursday
>>>>>>>>>>>> (23th)?
>>>>>>>>>>>>> I know there still remain several blocking
issues:
>>>>>>>>>>>>> 1) Java side CI failures
>>>>>>>>>>>>> 2) .NET side CI failures
>>>>>>>>>>>>> 3) Unmanaged AM test
>>>>>>>>>>>>> 
>>>>>>>>>>>>> I want to know if it is possible that
they can be resolved
>>>>>>>>>>>>> until next
>>>>>>>>>>>> Thursday.
>>>>>>>>>>>>> I'm currently taking a look at 1) (with
Sergiy's help), and
>>>>>>>>>>>>> the due date
>>>>>>>>>>>> is ok to me.
>>>>>>>>>>>>> How about 2) and 3) ? As far as I know,
2) is on Julia and
>>>>>>>>>>>>> Sergiy is
>>>>>>>>>>>> working on 3).
>>>>>>>>>>>>> If the plan seems not ok, could you please
share the ETA of
>>> them?
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>> Taegeon
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> --
>>>>>>>>> Byung-Gon Chun
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> --
>>>>>>>> Byung-Gon Chun
>>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>>> --
>>>> Byung-Gon Chun
>>> 
>>> 
>> 
>> 
>> --
>> Byung-Gon Chun
>> 


Mime
View raw message