Mailing-List: contact user-help@spark.apache.org; run by ezmlm
Precedence: bulk
MIME-Version: 1.0
Reply-To: everett@nuna.com
From: Everett Anderson <everett@nuna.com.INVALID>
Date: Thu, 29 Jun 2017 13:56:52 -0700
Message-ID: <CABc3QxG-KHX+b0X99TXACRy+6-mKZYgo_CnH+DkL2UEPKa9V=w@mail.gmail.com>
Subject: Spark, S3A, and 503 SlowDown / rate limit issues
To: user <user@spark.apache.org>
Content-Type: multipart/alternative; boundary="94eb2c0cc9ae484b9e05531f8a65"
archived-at: Thu, 29 Jun 2017 20:57:02 -0000

--94eb2c0cc9ae484b9e05531f8a65
Content-Type: text/plain; charset="UTF-8"

Hi,

We're using Spark 2.0.2 + Hadoop 2.7.3 on AWS EMR with S3A for direct I/O
from/to S3 from our Spark jobs. We
set mapreduce.fileoutputcommitter.algorithm.version=2 and are using
encrypted S3 buckets.

This has been working fine for us, but perhaps as we've been running more
jobs in parallel, we've started getting errors like

Status Code: 503, AWS Service: Amazon S3, AWS Request ID: ..., AWS Error
Code: SlowDown, AWS Error Message: Please reduce your request rate., S3
Extended Request ID: ...

We enabled CloudWatch S3 request metrics for one of our buckets and I was a
little alarmed to see spikes of over 800k S3 requests over a minute or so,
with the bulk of them HEAD requests.

We read and write Parquet files, and most tables have around 50
shards/parts, though some have up to 200. I imagine there's additional
parallelism when reading a shard in Parquet, though.

Has anyone else encountered this? How did you solve it?

I'd sure prefer to avoid copying all our data in and out of HDFS for each
job, if possible.

Thanks!

--94eb2c0cc9ae484b9e05531f8a65
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">Hi,<div><br></div><div>We&#39;re using Spark 2.0.2 + Hadoo=
p 2.7.3 on AWS EMR with S3A for direct I/O from/to S3 from our Spark jobs. =
We set=C2=A0mapreduce.fileoutputcommitter.algorithm.version=3D2 and are usi=
ng encrypted S3 buckets.</div><div><br></div><div>This has been working fin=
e for us, but perhaps as we&#39;ve been running more jobs in parallel, we&#=
39;ve started getting errors like</div><div><br></div><div><font face=3D"mo=
nospace, monospace">Status Code: 503, AWS Service: Amazon S3, AWS Request I=
D: ..., AWS Error Code: SlowDown, AWS Error Message: Please reduce your req=
uest rate., S3 Extended Request ID: ...<br></font></div><div><br></div><div=
>We enabled CloudWatch S3 request metrics for one of our buckets and I was =
a little alarmed to see spikes of over 800k S3 requests over a minute or so=
, with the bulk of them HEAD requests.</div><div><br></div><div>We read and=
 write Parquet files, and most tables have around 50 shards/parts, though s=
ome have up to 200. I imagine there&#39;s additional parallelism when readi=
ng a shard in Parquet, though.</div><div><br></div><div>Has anyone else enc=
ountered this? How did you solve it?</div><div><br></div><div>I&#39;d sure =
prefer to avoid copying all our data in and out of HDFS for each job, if po=
ssible.</div><div><br></div><div>Thanks!</div><div><br></div></div>

--94eb2c0cc9ae484b9e05531f8a65--