predictionio-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shinsuke Sugaya <shinsuke.sug...@gmail.com>
Subject Re: Update default build targets
Date Wed, 07 Jun 2017 02:34:54 GMT
I think PIO is difficult to support each corp version policies.
For platforms other than major ones, user will be able to build it
from source distribution as same as the current situation.
I checked some major platforms:

AWS(EMR 5.6.0)
- Hadoop 2.7.3
- Spark 2.1.1

Cloudera(CDH 5.11)
- Hadoop 2.6.0
- Spark 1.6.0/2.1 Release 1

Horton(HDP 2.6)
- Hadoop 2.7.3
- Spark 1.6.3/2.1

For Hadoop 2.6 and Spark 2.1, our updated dependencies will work.

I agree with Scala 2.11.
For Python, major machine-learning packages, such as scikit-learn,
still support Python 2.7. I think that Python 2 users exist.
Therefore, I prefer PIO to follow major packages if there is not
Python version problem. (Of course, I'd like to remove Python 2 support...)

Current ES5 support uses REST, not Transport, PIO does not
bind to the version of deployed Elasticsearch and also
Elasticsearch libraries have a backforward compatibility.
If user uses Elasticsearch 5 or the above, PIO will work
even if PIO uses the latest ES dependency. So, it doesn't matter
if old ES 5.x is deployed.

Regards,
 shinsuke


2017-06-06 23:37 GMT+09:00 Pat Ferrel <pat@occamsmachete.com>:
> Hmm, I’d rather see our release on versions that are most commonly deployed. A systems
with so many deps as PIO can run afoul of corp version policies. Then a build for all reasonable
deps will take care of edge cases.
>
> py3 and scala 2.11 seem to be the most commonly deployed, not so much for hadoop 2.7
afaict
>
> ES5 I agree is problematic. One big reason to move to it is so users can use SaaS ES,
maybe someone in user land can better say what version is out in SaaS? I’m not concerned
with ES release schedule, at issue is ES adoption level.
>
>
> On Jun 5, 2017, at 9:44 PM, Shinsuke Sugaya <shinsuke.sugaya@gmail.com> wrote:
>
>> What is the policy driving dependency upgrades?
>
> Although it might be difficult to define it,
> how about the following policy:
>
> - Select newer dependencies
> - If the above one is not supported in major platforms(ex. AWS,
>  Cloudera, Horton), change to the lower version
> - Review the version of dependencies at every releases if needed
>
> As for elasticsearch, I would like to keep a newer version
> since it will be released monthly.
>
>> I don’t run hadoop 2.7 locally and many users that have Cloudera or Horton
>> contracts may not either.
>
> Thank you for the info.
> CDH seems not to support hadoop 2.7...
> Updated as below:
>
> <UPDATE>
> 0.12.0:
> - PIO_SCALA_VERSION=2.11.8
> - PIO_SPARK_VERSION=2.1.1
> - PIO_ELASTICSEARCH_VERSION=5.4.1
> - PIO_HADOOP_VERSION=2.6.5
>
>
> For Python 2/3, I fixed some templates, such as recommender.
> Since I think template scripts can support both Python 2/3,
> I'll fix them.
>
> Regards,
> shinsuke
>
> 2017-06-06 4:57 GMT+09:00 Pat Ferrel <pat@occamsmachete.com>:
>> What is the policy driving dependency upgrades?
>>
>> I don’t run hadoop 2.7 locally and many users that have Cloudera or Horton
>> contracts may not either. Not sure why this should be the default until it’s
>> the most popular of we need some feature of it.
>>
>> I’d agree with most of what @Shinsuke suggests as long as there is an easy
>> way to build for any reasonable combination of deps.
>>
>> The hard one will be Python 3. All existing python scripts in templates will
>> need upgrading since it’s very difficult to support mixed py2 and py3 where
>> scala 2.10 and 2,11 are much easier. I still think it’s time to do this but
>> mention it because with each upgrade we need to consider how may templates
>> are left even further behind. Many now do not work with Apache PIO, this may
>> put them further behind.
>>
>> Though we work on PIO we must remember that PIO does nothing interesting
>> without templates and ask ourselves what pain we may cause for template
>> users.
>>
>>
>>
>> On Jun 5, 2017, at 11:06 AM, Donald Szeto <donald@apache.org> wrote:
>>
>> Hey all, this has a huge impact to the default build, so If you see any
>> issue with this, please let us know as soon as possible.
>>
>> On Sun, Jun 4, 2017 at 10:25 PM, Shinsuke Sugaya <shinsuke.sugaya@gmail.com>
>> wrote:
>>>
>>> Hi all,
>>>
>>> We have a plan to change default build targets in PIO-83 and PIO-84.
>>> Current versions look too old, so it will be better to support
>>> newer versions as default.
>>>
>>> Current:
>>> - PIO_SCALA_VERSION=2.10.6
>>> - PIO_SPARK_VERSION=1.6.3
>>> - PIO_ELASTICSEARCH_VERSION=1.7.6
>>> - PIO_HADOOP_VERSION=2.6.5
>>>
>>> They will be changed to:
>>>
>>> 0.12.0:
>>> - PIO_SCALA_VERSION=2.11.8
>>> - PIO_SPARK_VERSION=2.1.1
>>> - PIO_ELASTICSEARCH_VERSION=5.4.1
>>> - PIO_HADOOP_VERSION=2.7.3
>>>
>>> Note that this change does not discard old version supports.
>>> If you want to use old versions, you can build PIO with them.
>>>
>>> Please let us know if you have any concerns.
>>>
>>> https://issues.apache.org/jira/browse/PIO-83
>>> https://issues.apache.org/jira/browse/PIO-84
>>>
>>> Regards,
>>> shinsuke
>>
>>
>>
>

Mime
View raw message