Return-Path: X-Original-To: apmail-hadoop-hdfs-dev-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id CB82018FF4 for ; Tue, 22 Dec 2015 18:39:47 +0000 (UTC) Received: (qmail 90732 invoked by uid 500); 22 Dec 2015 18:39:46 -0000 Delivered-To: apmail-hadoop-hdfs-dev-archive@hadoop.apache.org Received: (qmail 90630 invoked by uid 500); 22 Dec 2015 18:39:46 -0000 Mailing-List: contact hdfs-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hdfs-dev@hadoop.apache.org Delivered-To: mailing list hdfs-dev@hadoop.apache.org Received: (qmail 90378 invoked by uid 99); 22 Dec 2015 18:39:46 -0000 Received: from Unknown (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 22 Dec 2015 18:39:46 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id 10088C0608 for ; Tue, 22 Dec 2015 18:39:46 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 0.151 X-Spam-Level: X-Spam-Status: No, score=0.151 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_ENVFROM_END_DIGIT=0.25, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd4-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-us-west.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id EZZcIWqD0b_V for ; Tue, 22 Dec 2015 18:39:34 +0000 (UTC) Received: from mail-pf0-f181.google.com (mail-pf0-f181.google.com [209.85.192.181]) by mx1-us-west.apache.org (ASF Mail Server at mx1-us-west.apache.org) with ESMTPS id CC8FD23031 for ; Tue, 22 Dec 2015 18:39:34 +0000 (UTC) Received: by mail-pf0-f181.google.com with SMTP id 78so37448234pfw.2 for ; Tue, 22 Dec 2015 10:39:34 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=z7GkqstTCzqf3kAr33kiBX7Z44llSOV2ZOov8yCL3EY=; b=DP+C5nAVQD83sMngXqEXwJ4juEGmMNMxRZp5GDueiZdM8lqPjIM8at+m+7Cy9Q9mjH VefhNy2bqy/s+HJZ1N9uyIhop5bNWPsFLqPUX84tLJLrc8RkAH1UM2sINNhdlYwFpubZ Y0+HI3Yuw84GKUG42iN/psVi8Hi/9Kc2AN1DFrjNHTeNr3ZjdKeFL/k7WKstyl2SJ2t0 JBcNbYZGKaDfrPrylh2yfkIspTMEG6JlsMrcUKX82t83Tb3/cpf1Hn+ZfrYUCU5sLspy zl39bv+hVKNZagGm4O8P6mrUhJoUyR3N7WigJTgOhPVWCmHZfw2hjODuIdVjIWsGbKEN uqhg== MIME-Version: 1.0 X-Received: by 10.98.42.75 with SMTP id q72mr38561254pfq.10.1450809574512; Tue, 22 Dec 2015 10:39:34 -0800 (PST) Received: by 10.66.235.66 with HTTP; Tue, 22 Dec 2015 10:39:34 -0800 (PST) In-Reply-To: References: Date: Tue, 22 Dec 2015 13:39:34 -0500 Message-ID: Subject: Re: Revive HADOOP-2705? From: "dam6923 ." To: hdfs-dev@hadoop.apache.org Content-Type: text/plain; charset=UTF-8 Colin, I will continue my investigation into the matter. Thanks. I will just point out that org.apache.hadoop.hdfs.server.datanode.BlockSender overwrites this value with a 64KB value, if necessary. Line 116. --------------------- On a side note, can you explain the purpose of: org.apache.hadoop.hdfs.DFSUtilClient.getSmallBufferSize(Configuration) This method seems to be an undocumented "feature" that overrides the user's configuration but does not explain the reason. It appears that in most of the cases, this method is used when creating a buffer for sending small messages between data-nodes. If that is the case, I would think that the message size should be the greatest consideration in setting a buffer size, not the value specified in the user's variable. For maintainability and predictability, I would think a hard-coded 512 would be most appropriate, or simply use the default buffer size in BufferedOutputStream/BufferedInputStream The one notable exception I see is in: org.apache.hadoop.hdfs.server.datanode.DataNode.DataTransfer.run() - line 2261 It appears that the OutputStream used for sending blocks is using this smaller buffer size to send entire data blocks, but no comment exists to indicate why this smaller buffer is utilized instead of the size configured by the user. Thanks! On Fri, Dec 18, 2015 at 9:59 PM, Colin McCabe wrote: > Reading files from HDFS has different performance characteristics than > reading local files. For one thing, HDFS does a few megabyes of > readahead internally by default. If you are going to make a > performance improvement suggestion, I would strongly encourage you to > test it first. > > cheers, > Colin > > > On Tue, Dec 15, 2015 at 2:22 PM, dam6923 . wrote: >> Here was the justification from 2004: >> >> https://bugs.openjdk.java.net/browse/JDK-4953311 >> >> >> Also, some research into the matter (not my own): >> >> http://nadeausoftware.com/articles/2008/02/java_tip_how_read_files_quickly >> >> One of the conclusions: >> >> "Minimize I/O operations by reading an array at a time, not a byte at >> a time. An 8Kbyte array is a good size." >> >> >> On Tue, Dec 15, 2015 at 3:41 PM, Colin McCabe wrote: >>> Hi David, >>> >>> Do you have benchmarks to justify changing this configuration? >>> >>> best, >>> Colin >>> >>> On Wed, Dec 9, 2015 at 8:05 AM, dam6923 . wrote: >>>> Hello! >>>> >>>> A while back, Java 1.6, the size of the internal internal file-reading >>>> buffers were bumped-up to 8192 bytes. >>>> >>>> http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/java/io/BufferedInputStream.java >>>> >>>> Perhaps it's time to update Hadoop to at least this default level too. :) >>>> >>>> https://issues.apache.org/jira/browse/HADOOP-2705 >>>> >>>> Thanks, >>>> David