spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hyukjin Kwon <gurwls...@gmail.com>
Subject Re: [VOTE] Spark 2.3.1 (RC4)
Date Mon, 04 Jun 2018 01:12:11 GMT
+1

2018년 6월 3일 (일) 오후 9:25, Ricardo Almeida <ricardo.almeida@actnowib.com>님이
작성:

> +1 (non-binding)
>
> On 3 June 2018 at 09:23, Dongjoon Hyun <dongjoon.hyun@gmail.com> wrote:
>
>> +1
>>
>> Bests,
>> Dongjoon.
>>
>> On Sat, Jun 2, 2018 at 8:09 PM, Denny Lee <denny.g.lee@gmail.com> wrote:
>>
>>> +1
>>>
>>> On Sat, Jun 2, 2018 at 4:53 PM Nicholas Chammas <
>>> nicholas.chammas@gmail.com> wrote:
>>>
>>>> I'll give that a try, but I'll still have to figure out what to do if
>>>> none of the release builds work with hadoop-aws, since Flintrock deploys
>>>> Spark release builds to set up a cluster. Building Spark is slow, so we
>>>> only do it if the user specifically requests a Spark version by git hash.
>>>> (This is basically how spark-ec2 did things, too.)
>>>>
>>>>
>>>> On Sat, Jun 2, 2018 at 6:54 PM Marcelo Vanzin <vanzin@cloudera.com>
>>>> wrote:
>>>>
>>>>> If you're building your own Spark, definitely try the hadoop-cloud
>>>>> profile. Then you don't even need to pull anything at runtime,
>>>>> everything is already packaged with Spark.
>>>>>
>>>>> On Fri, Jun 1, 2018 at 6:51 PM, Nicholas Chammas
>>>>> <nicholas.chammas@gmail.com> wrote:
>>>>> > pyspark --packages org.apache.hadoop:hadoop-aws:2.7.3 didn’t work
>>>>> for me
>>>>> > either (even building with -Phadoop-2.7). I guess I’ve been relying
>>>>> on an
>>>>> > unsupported pattern and will need to figure something else out going
>>>>> forward
>>>>> > in order to use s3a://.
>>>>> >
>>>>> >
>>>>> > On Fri, Jun 1, 2018 at 9:09 PM Marcelo Vanzin <vanzin@cloudera.com>
>>>>> wrote:
>>>>> >>
>>>>> >> I have personally never tried to include hadoop-aws that way.
But at
>>>>> >> the very least, I'd try to use the same version of Hadoop as
the
>>>>> Spark
>>>>> >> build (2.7.3 IIRC). I don't really expect a different version
to
>>>>> work,
>>>>> >> and if it did in the past it definitely was not by design.
>>>>> >>
>>>>> >> On Fri, Jun 1, 2018 at 5:50 PM, Nicholas Chammas
>>>>> >> <nicholas.chammas@gmail.com> wrote:
>>>>> >> > Building with -Phadoop-2.7 didn’t help, and if I remember
>>>>> correctly,
>>>>> >> > building with -Phadoop-2.8 worked with hadoop-aws in the
2.3.0
>>>>> release,
>>>>> >> > so
>>>>> >> > it appears something has changed since then.
>>>>> >> >
>>>>> >> > I wasn’t familiar with -Phadoop-cloud, but I can try
that.
>>>>> >> >
>>>>> >> > My goal here is simply to confirm that this release of
Spark
>>>>> works with
>>>>> >> > hadoop-aws like past releases did, particularly for Flintrock
>>>>> users who
>>>>> >> > use
>>>>> >> > Spark with S3A.
>>>>> >> >
>>>>> >> > We currently provide -hadoop2.6, -hadoop2.7, and -without-hadoop
>>>>> builds
>>>>> >> > with
>>>>> >> > every Spark release. If the -hadoop2.7 release build won’t
work
>>>>> with
>>>>> >> > hadoop-aws anymore, are there plans to provide a new build
type
>>>>> that
>>>>> >> > will?
>>>>> >> >
>>>>> >> > Apologies if the question is poorly formed. I’m batting
a bit
>>>>> outside my
>>>>> >> > league here. Again, my goal is simply to confirm that I/my
users
>>>>> still
>>>>> >> > have
>>>>> >> > a way to use s3a://. In the past, that way was simply to
call
>>>>> pyspark
>>>>> >> > --packages org.apache.hadoop:hadoop-aws:2.8.4 or something
very
>>>>> similar.
>>>>> >> > If
>>>>> >> > that will no longer work, I’m trying to confirm that
the change of
>>>>> >> > behavior
>>>>> >> > is intentional or acceptable (as a review for the Spark
project)
>>>>> and
>>>>> >> > figure
>>>>> >> > out what I need to change (as due diligence for Flintrock’s
>>>>> users).
>>>>> >> >
>>>>> >> > Nick
>>>>> >> >
>>>>> >> >
>>>>> >> > On Fri, Jun 1, 2018 at 8:21 PM Marcelo Vanzin <
>>>>> vanzin@cloudera.com>
>>>>> >> > wrote:
>>>>> >> >>
>>>>> >> >> Using the hadoop-aws package is probably going to be
a little
>>>>> more
>>>>> >> >> complicated than that. The best bet is to use a custom
build of
>>>>> Spark
>>>>> >> >> that includes it (use -Phadoop-cloud). Otherwise you're
probably
>>>>> >> >> looking at some nasty dependency issues, especially
if you end up
>>>>> >> >> mixing different versions of Hadoop.
>>>>> >> >>
>>>>> >> >> On Fri, Jun 1, 2018 at 4:01 PM, Nicholas Chammas
>>>>> >> >> <nicholas.chammas@gmail.com> wrote:
>>>>> >> >> > I was able to successfully launch a Spark cluster
on EC2 at
>>>>> 2.3.1 RC4
>>>>> >> >> > using
>>>>> >> >> > Flintrock. However, trying to load the hadoop-aws
package gave
>>>>> me
>>>>> >> >> > some
>>>>> >> >> > errors.
>>>>> >> >> >
>>>>> >> >> > $ pyspark --packages org.apache.hadoop:hadoop-aws:2.8.4
>>>>> >> >> >
>>>>> >> >> > <snipped>
>>>>> >> >> >
>>>>> >> >> > :: problems summary ::
>>>>> >> >> > :::: WARNINGS
>>>>> >> >> >                 [NOT FOUND  ]
>>>>> >> >> > com.sun.jersey#jersey-json;1.9!jersey-json.jar(bundle)
(2ms)
>>>>> >> >> >         ==== local-m2-cache: tried
>>>>> >> >> >
>>>>> >> >> >
>>>>> >> >> >
>>>>> >> >> >
>>>>> file:/home/ec2-user/.m2/repository/com/sun/jersey/jersey-json/1.9/jersey-json-1.9.jar
>>>>> >> >> >                 [NOT FOUND  ]
>>>>> >> >> > com.sun.jersey#jersey-server;1.9!jersey-server.jar(bundle)
>>>>> (0ms)
>>>>> >> >> >         ==== local-m2-cache: tried
>>>>> >> >> >
>>>>> >> >> >
>>>>> >> >> >
>>>>> >> >> >
>>>>> file:/home/ec2-user/.m2/repository/com/sun/jersey/jersey-server/1.9/jersey-server-1.9.jar
>>>>> >> >> >                 [NOT FOUND  ]
>>>>> >> >> > org.codehaus.jettison#jettison;1.1!jettison.jar(bundle)
(1ms)
>>>>> >> >> >         ==== local-m2-cache: tried
>>>>> >> >> >
>>>>> >> >> >
>>>>> >> >> >
>>>>> >> >> >
>>>>> file:/home/ec2-user/.m2/repository/org/codehaus/jettison/jettison/1.1/jettison-1.1.jar
>>>>> >> >> >                 [NOT FOUND  ]
>>>>> >> >> > com.sun.xml.bind#jaxb-impl;2.2.3-1!jaxb-impl.jar
(0ms)
>>>>> >> >> >         ==== local-m2-cache: tried
>>>>> >> >> >
>>>>> >> >> >
>>>>> >> >> >
>>>>> >> >> >
>>>>> file:/home/ec2-user/.m2/repository/com/sun/xml/bind/jaxb-impl/2.2.3-1/jaxb-impl-2.2.3-1.jar
>>>>> >> >> >
>>>>> >> >> > I’d guess I’m probably using the wrong version
of hadoop-aws,
>>>>> but I
>>>>> >> >> > called
>>>>> >> >> > make-distribution.sh with -Phadoop-2.8 so I’m
not sure what
>>>>> else to
>>>>> >> >> > try.
>>>>> >> >> >
>>>>> >> >> > Any quick pointers?
>>>>> >> >> >
>>>>> >> >> > Nick
>>>>> >> >> >
>>>>> >> >> >
>>>>> >> >> > On Fri, Jun 1, 2018 at 6:29 PM Marcelo Vanzin
<
>>>>> vanzin@cloudera.com>
>>>>> >> >> > wrote:
>>>>> >> >> >>
>>>>> >> >> >> Starting with my own +1 (binding).
>>>>> >> >> >>
>>>>> >> >> >> On Fri, Jun 1, 2018 at 3:28 PM, Marcelo Vanzin
<
>>>>> vanzin@cloudera.com>
>>>>> >> >> >> wrote:
>>>>> >> >> >> > Please vote on releasing the following
candidate as Apache
>>>>> Spark
>>>>> >> >> >> > version
>>>>> >> >> >> > 2.3.1.
>>>>> >> >> >> >
>>>>> >> >> >> > Given that I expect at least a few people
to be busy with
>>>>> Spark
>>>>> >> >> >> > Summit
>>>>> >> >> >> > next
>>>>> >> >> >> > week, I'm taking the liberty of setting
an extended voting
>>>>> period.
>>>>> >> >> >> > The
>>>>> >> >> >> > vote
>>>>> >> >> >> > will be open until Friday, June 8th,
at 19:00 UTC (that's
>>>>> 12:00
>>>>> >> >> >> > PDT).
>>>>> >> >> >> >
>>>>> >> >> >> > It passes with a majority of +1 votes,
which must include
>>>>> at least
>>>>> >> >> >> > 3
>>>>> >> >> >> > +1
>>>>> >> >> >> > votes
>>>>> >> >> >> > from the PMC.
>>>>> >> >> >> >
>>>>> >> >> >> > [ ] +1 Release this package as Apache
Spark 2.3.1
>>>>> >> >> >> > [ ] -1 Do not release this package because
...
>>>>> >> >> >> >
>>>>> >> >> >> > To learn more about Apache Spark, please
see
>>>>> >> >> >> > http://spark.apache.org/
>>>>> >> >> >> >
>>>>> >> >> >> > The tag to be voted on is v2.3.1-rc4
(commit 30aaa5a3):
>>>>> >> >> >> > https://github.com/apache/spark/tree/v2.3.1-rc4
>>>>> >> >> >> >
>>>>> >> >> >> > The release files, including signatures,
digests, etc. can
>>>>> be
>>>>> >> >> >> > found
>>>>> >> >> >> > at:
>>>>> >> >> >> >
>>>>> https://dist.apache.org/repos/dist/dev/spark/v2.3.1-rc4-bin/
>>>>> >> >> >> >
>>>>> >> >> >> > Signatures used for Spark RCs can be
found in this file:
>>>>> >> >> >> > https://dist.apache.org/repos/dist/dev/spark/KEYS
>>>>> >> >> >> >
>>>>> >> >> >> > The staging repository for this release
can be found at:
>>>>> >> >> >> >
>>>>> >> >> >> >
>>>>> >> >> >> >
>>>>> https://repository.apache.org/content/repositories/orgapachespark-1272/
>>>>> >> >> >> >
>>>>> >> >> >> > The documentation corresponding to this
release can be
>>>>> found at:
>>>>> >> >> >> >
>>>>> https://dist.apache.org/repos/dist/dev/spark/v2.3.1-rc4-docs/
>>>>> >> >> >> >
>>>>> >> >> >> > The list of bug fixes going into 2.3.1
can be found at the
>>>>> >> >> >> > following
>>>>> >> >> >> > URL:
>>>>> >> >> >> >
>>>>> https://issues.apache.org/jira/projects/SPARK/versions/12342432
>>>>> >> >> >> >
>>>>> >> >> >> > FAQ
>>>>> >> >> >> >
>>>>> >> >> >> > =========================
>>>>> >> >> >> > How can I help test this release?
>>>>> >> >> >> > =========================
>>>>> >> >> >> >
>>>>> >> >> >> > If you are a Spark user, you can help
us test this release
>>>>> by
>>>>> >> >> >> > taking
>>>>> >> >> >> > an existing Spark workload and running
on this release
>>>>> candidate,
>>>>> >> >> >> > then
>>>>> >> >> >> > reporting any regressions.
>>>>> >> >> >> >
>>>>> >> >> >> > If you're working in PySpark you can
set up a virtual env
>>>>> and
>>>>> >> >> >> > install
>>>>> >> >> >> > the current RC and see if anything important
breaks, in the
>>>>> >> >> >> > Java/Scala
>>>>> >> >> >> > you can add the staging repository to
your projects
>>>>> resolvers and
>>>>> >> >> >> > test
>>>>> >> >> >> > with the RC (make sure to clean up the
artifact cache
>>>>> before/after
>>>>> >> >> >> > so
>>>>> >> >> >> > you don't end up building with a out
of date RC going
>>>>> forward).
>>>>> >> >> >> >
>>>>> >> >> >> > ===========================================
>>>>> >> >> >> > What should happen to JIRA tickets still
targeting 2.3.1?
>>>>> >> >> >> > ===========================================
>>>>> >> >> >> >
>>>>> >> >> >> > The current list of open tickets targeted
at 2.3.1 can be
>>>>> found
>>>>> >> >> >> > at:
>>>>> >> >> >> > https://s.apache.org/Q3Uo
>>>>> >> >> >> >
>>>>> >> >> >> > Committers should look at those and triage.
Extremely
>>>>> important
>>>>> >> >> >> > bug
>>>>> >> >> >> > fixes, documentation, and API tweaks
that impact
>>>>> compatibility
>>>>> >> >> >> > should
>>>>> >> >> >> > be worked on immediately. Everything
else please retarget
>>>>> to an
>>>>> >> >> >> > appropriate release.
>>>>> >> >> >> >
>>>>> >> >> >> > ==================
>>>>> >> >> >> > But my bug isn't fixed?
>>>>> >> >> >> > ==================
>>>>> >> >> >> >
>>>>> >> >> >> > In order to make timely releases, we
will typically not
>>>>> hold the
>>>>> >> >> >> > release unless the bug in question is
a regression from the
>>>>> >> >> >> > previous
>>>>> >> >> >> > release. That being said, if there is
something which is a
>>>>> >> >> >> > regression
>>>>> >> >> >> > that has not been correctly targeted
please ping me or a
>>>>> committer
>>>>> >> >> >> > to
>>>>> >> >> >> > help target the issue.
>>>>> >> >> >> >
>>>>> >> >> >> >
>>>>> >> >> >> > --
>>>>> >> >> >> > Marcelo
>>>>> >> >> >>
>>>>> >> >> >>
>>>>> >> >> >>
>>>>> >> >> >> --
>>>>> >> >> >> Marcelo
>>>>> >> >> >>
>>>>> >> >> >>
>>>>> >> >> >>
>>>>> ---------------------------------------------------------------------
>>>>> >> >> >> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>>>>> >> >> >>
>>>>> >> >> >
>>>>> >> >>
>>>>> >> >>
>>>>> >> >>
>>>>> >> >> --
>>>>> >> >> Marcelo
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >> --
>>>>> >> Marcelo
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Marcelo
>>>>>
>>>>
>>
>

Mime
View raw message