Return-Path: X-Original-To: apmail-hadoop-hdfs-dev-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 4AF9B104F4 for ; Tue, 4 Nov 2014 14:22:38 +0000 (UTC) Received: (qmail 79472 invoked by uid 500); 4 Nov 2014 14:22:37 -0000 Delivered-To: apmail-hadoop-hdfs-dev-archive@hadoop.apache.org Received: (qmail 79370 invoked by uid 500); 4 Nov 2014 14:22:37 -0000 Mailing-List: contact hdfs-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hdfs-dev@hadoop.apache.org Delivered-To: mailing list hdfs-dev@hadoop.apache.org Received: (qmail 79357 invoked by uid 99); 4 Nov 2014 14:22:36 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 04 Nov 2014 14:22:36 +0000 X-ASF-Spam-Status: No, hits=-0.5 required=5.0 tests=FREEMAIL_ENVFROM_END_DIGIT,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of eitan27@gmail.com designates 209.85.192.170 as permitted sender) Received: from [209.85.192.170] (HELO mail-pd0-f170.google.com) (209.85.192.170) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 04 Nov 2014 14:22:31 +0000 Received: by mail-pd0-f170.google.com with SMTP id z10so13823828pdj.29 for ; Tue, 04 Nov 2014 06:19:55 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:from:date:message-id:subject:to:content-type; bh=V99KMBauEugDSEHQ/5563vSXkuDZnu7GiWfJGT8z2fA=; b=C9li3QE1Bk5jnIS27jPp4bpmrVxeKsrOmLfyKvuLyGTGjwe1UDH2sLepMkN9aiNyU8 EG/tMV4a6sW1wHxpUMjfAahtGQh/mfrqUWdn8j9wrZVUglyPM4j7sOjAhFC+OPZPEPbh 421JR4EVekcz+rNhX5I+Lp33G3RhloslpOI6HcpUjlLjMZDlzUHSPVTlfXWS4UIjFHVw Vy2j5b0q6z/kk/KSeSv9m+dqScPBVBxbmCSjOiOuk0TFSdhb/h2UjKeYlEYCt1DmX43H Nz6d3XKIZbBzspLfO1llKd741If3ml71IdKHxYiIGZSlCXxFwS4hMc+oAu9TgY47rC5n 6b8g== X-Received: by 10.70.130.81 with SMTP id oc17mr51055861pdb.48.1415110795651; Tue, 04 Nov 2014 06:19:55 -0800 (PST) MIME-Version: 1.0 Received: by 10.70.43.106 with HTTP; Tue, 4 Nov 2014 06:19:35 -0800 (PST) From: Eitan Rosenfeld Date: Tue, 4 Nov 2014 16:19:35 +0200 Message-ID: Subject: Why do reads take as long as replicated writes? To: hdfs-dev@hadoop.apache.org Content-Type: text/plain; charset=UTF-8 X-Virus-Checked: Checked by ClamAV on apache.org I am benchmarking my cluster of 16 nodes (all in one rack) with TestDFSIO on Hadoop 1.0.4. For simplicity, I turned off speculative task execution and set the max map and reduce tasks to 1. With a replication factor of 2, writing 1 file of 5GB takes twice as long as reading 1 file. This result seems to make sense since the replication results in twice the I/O in the cluster versus the read. However, as I scale up the number of 5GB files from 1 to 64 files, reading ultimately takes as long as writing. In particular, I see this result when writing and reading 64 such files. What could cause read performance to degrade faster than write performance as the number of files increases? The full results (number of 5GB files, ratio of write time to read time) are below: 1, 2.02 2, 1.87 4, 1.73 8, 1.54 16, 1.37 32, 1.29 64, 1.01 Thank you, Eitan