Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 43829200CB6 for ; Thu, 29 Jun 2017 22:57:02 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 4222A160BED; Thu, 29 Jun 2017 20:57:02 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 88F4D160BC6 for ; Thu, 29 Jun 2017 22:57:01 +0200 (CEST) Received: (qmail 71334 invoked by uid 500); 29 Jun 2017 20:56:59 -0000 Mailing-List: contact user-help@spark.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list user@spark.apache.org Received: (qmail 71324 invoked by uid 99); 29 Jun 2017 20:56:59 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 29 Jun 2017 20:56:59 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id BF0031AF92E for ; Thu, 29 Jun 2017 20:56:58 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.379 X-Spam-Level: ** X-Spam-Status: No, score=2.379 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, RCVD_IN_SORBS_SPAM=0.5, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd2-us-west.apache.org (amavisd-new); dkim=pass (1024-bit key) header.d=nuna.com Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id O1TnFG8m1Nk4 for ; Thu, 29 Jun 2017 20:56:54 +0000 (UTC) Received: from mail-qk0-f174.google.com (mail-qk0-f174.google.com [209.85.220.174]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id E69F55FB71 for ; Thu, 29 Jun 2017 20:56:53 +0000 (UTC) Received: by mail-qk0-f174.google.com with SMTP id d78so86435002qkb.1 for ; Thu, 29 Jun 2017 13:56:53 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:reply-to:from:date:message-id :subject:to; bh=5j+Qmr7NChpGDcDfwxSWX/sIRn9CgAwnteOPZpdAXCw=; b=CWEbgD/bs7k59FRFDflpE4ceNSgPNEQTqGyvggMrLgMxwj96lV7Ufi4sZpAla66W3B SxFoo0Mnu4XlRdN/1AnWfPvakmPEXXQdQFqWzRvssiI+WF+zPFXci6ps9pVMKG4ENgzc 3SC3oERea75sOSx3g68Ip8bAmXvtEFJtCxLUAA0blLBT1YxYjr+QN/fm2VOOfNYh233V tjHXVIvBePVmtvUB3jgFATO0nc9hEH+KYI+P0Qys+pjEf8P3RdbjHAL8q/YVFUAma6xR jxhri2kkSAbPJ9jjApx7SaA+8NAPNt3oz+pg3RzS5y1MV8969I6JB8lPahx1sgA2C272 SJzA== X-Gm-Message-State: AKS2vOyrECa3yJPuF2cvLa9yTQplEl/4bQpIdD/IDQZsVjPySnbareJ1 2Y3C0ohyinsZdVxTTz1O26nWjDoSuSO23L/D2w== X-Received: by 10.55.220.133 with SMTP id v127mr22303266qki.42.1498769812634; Thu, 29 Jun 2017 13:56:52 -0700 (PDT) MIME-Version: 1.0 Received: by 10.12.180.168 with HTTP; Thu, 29 Jun 2017 13:56:52 -0700 (PDT) Reply-To: everett@nuna.com From: Everett Anderson Date: Thu, 29 Jun 2017 13:56:52 -0700 Message-ID: Subject: Spark, S3A, and 503 SlowDown / rate limit issues To: user Content-Type: multipart/alternative; boundary="94eb2c0cc9ae484b9e05531f8a65" archived-at: Thu, 29 Jun 2017 20:57:02 -0000 --94eb2c0cc9ae484b9e05531f8a65 Content-Type: text/plain; charset="UTF-8" Hi, We're using Spark 2.0.2 + Hadoop 2.7.3 on AWS EMR with S3A for direct I/O from/to S3 from our Spark jobs. We set mapreduce.fileoutputcommitter.algorithm.version=2 and are using encrypted S3 buckets. This has been working fine for us, but perhaps as we've been running more jobs in parallel, we've started getting errors like Status Code: 503, AWS Service: Amazon S3, AWS Request ID: ..., AWS Error Code: SlowDown, AWS Error Message: Please reduce your request rate., S3 Extended Request ID: ... We enabled CloudWatch S3 request metrics for one of our buckets and I was a little alarmed to see spikes of over 800k S3 requests over a minute or so, with the bulk of them HEAD requests. We read and write Parquet files, and most tables have around 50 shards/parts, though some have up to 200. I imagine there's additional parallelism when reading a shard in Parquet, though. Has anyone else encountered this? How did you solve it? I'd sure prefer to avoid copying all our data in and out of HDFS for each job, if possible. Thanks! --94eb2c0cc9ae484b9e05531f8a65 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Hi,

We're using Spark 2.0.2 + Hadoo= p 2.7.3 on AWS EMR with S3A for direct I/O from/to S3 from our Spark jobs. = We set=C2=A0mapreduce.fileoutputcommitter.algorithm.version=3D2 and are usi= ng encrypted S3 buckets.

This has been working fin= e for us, but perhaps as we've been running more jobs in parallel, we&#= 39;ve started getting errors like

Status Code: 503, AWS Service: Amazon S3, AWS Request I= D: ..., AWS Error Code: SlowDown, AWS Error Message: Please reduce your req= uest rate., S3 Extended Request ID: ...

We enabled CloudWatch S3 request metrics for one of our buckets and I was = a little alarmed to see spikes of over 800k S3 requests over a minute or so= , with the bulk of them HEAD requests.

We read and= write Parquet files, and most tables have around 50 shards/parts, though s= ome have up to 200. I imagine there's additional parallelism when readi= ng a shard in Parquet, though.

Has anyone else enc= ountered this? How did you solve it?

I'd sure = prefer to avoid copying all our data in and out of HDFS for each job, if po= ssible.

Thanks!

--94eb2c0cc9ae484b9e05531f8a65--