Return-Path: X-Original-To: apmail-spark-dev-archive@minotaur.apache.org Delivered-To: apmail-spark-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 0969218100 for ; Mon, 26 Oct 2015 08:57:52 +0000 (UTC) Received: (qmail 4809 invoked by uid 500); 26 Oct 2015 08:57:37 -0000 Delivered-To: apmail-spark-dev-archive@spark.apache.org Received: (qmail 4709 invoked by uid 500); 26 Oct 2015 08:57:37 -0000 Mailing-List: contact dev-help@spark.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list dev@spark.apache.org Received: (qmail 4697 invoked by uid 99); 26 Oct 2015 08:57:37 -0000 Received: from Unknown (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 26 Oct 2015 08:57:37 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 25F541801DA for ; Mon, 26 Oct 2015 08:57:37 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 3.148 X-Spam-Level: *** X-Spam-Status: No, score=3.148 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_ENVFROM_END_DIGIT=0.25, HTML_MESSAGE=3, RCVD_IN_MSPIKE_H2=-0.001, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd3-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-eu-west.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id T7TKNLuNZxWs for ; Mon, 26 Oct 2015 08:57:36 +0000 (UTC) Received: from mail-qg0-f49.google.com (mail-qg0-f49.google.com [209.85.192.49]) by mx1-eu-west.apache.org (ASF Mail Server at mx1-eu-west.apache.org) with ESMTPS id 27B6820604 for ; Mon, 26 Oct 2015 08:57:36 +0000 (UTC) Received: by qgem9 with SMTP id m9so113636982qge.1 for ; Mon, 26 Oct 2015 01:57:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:from:date:message-id:subject:to:content-type; bh=+ztC9ffN66DwP8WJKY78k7LZKdxaTvn3rVKtXyZY7vI=; b=kWjzHTZyiRbBoATuvonNRAz2xPhvskNmBy/vaO8jxhd/VV5ijBUQ/qtrzpItxwBI0L dH3qcu22WQ8GiUUC9sQ1GnPSf/1lqxxysl93C08dDFpgWEymEEFgqtPy86R7jmqVe8uC s5acF8TNlH7+bWXB1oNs8k2u1o+1HkZHgN5lOevHYO8IyCFpCMJqXRUC2EeX3JgSUm8x 6cGvD647hX52JEU9eggiDxyX+GwmTKAfj27nAdvxGGeJBZ4Hygf1PPpUHt0Aq3tAHMg4 /vRLdNl5fYK1vhhNbM4aO25zZr71uba/f5YA+xfW/BpwJs9Q3FYRQE0Mvn3ZoZTa82B4 mm9g== X-Received: by 10.140.93.195 with SMTP id d61mr41448712qge.89.1445849849629; Mon, 26 Oct 2015 01:57:29 -0700 (PDT) MIME-Version: 1.0 From: Jinfeng Li Date: Mon, 26 Oct 2015 08:57:20 +0000 Message-ID: Subject: Loading Files from HDFS Incurs Network Communication To: "dev@spark.apache.org" Content-Type: multipart/alternative; boundary=001a113b95e2af92c60522fe26e9 --001a113b95e2af92c60522fe26e9 Content-Type: text/plain; charset=UTF-8 Hi, I find that loading files from HDFS can incur huge amount of network traffic. Input size is 90G and network traffic is about 80G. By my understanding, local files should be read and thus no network communication is needed. I use Spark 1.5.1, and the following is my code: val textRDD = sc.textFile("hdfs://master:9000/inputDir") textRDD.count Jeffrey --001a113b95e2af92c60522fe26e9 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Hi, I find that loading files from HDFS can incur huge amo= unt of network traffic. Input size is 90G and network traffic is about 80G.= By my understanding, local files should be read and thus no network commun= ication is needed.=C2=A0

I use Spark 1.5.1, and the foll= owing is my code:

val textRDD =3D sc.textFile(&quo= t;hdfs://master:9000/inputDir")
textRDD.count

=
Jeffrey
--001a113b95e2af92c60522fe26e9--