Return-Path: X-Original-To: apmail-hadoop-common-user-archive@www.apache.org Delivered-To: apmail-hadoop-common-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 6FC88DF47 for ; Tue, 13 Nov 2012 21:54:05 +0000 (UTC) Received: (qmail 35809 invoked by uid 500); 13 Nov 2012 21:54:01 -0000 Delivered-To: apmail-hadoop-common-user-archive@hadoop.apache.org Received: (qmail 35539 invoked by uid 500); 13 Nov 2012 21:54:00 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 35532 invoked by uid 99); 13 Nov 2012 21:54:00 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 13 Nov 2012 21:54:00 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of adi@cloudera.com designates 209.85.223.176 as permitted sender) Received: from [209.85.223.176] (HELO mail-ie0-f176.google.com) (209.85.223.176) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 13 Nov 2012 21:53:54 +0000 Received: by mail-ie0-f176.google.com with SMTP id k11so13067779iea.35 for ; Tue, 13 Nov 2012 13:53:33 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:x-gm-message-state; bh=PjjHF1YOX/2Vtf/LEVo3Y7ZbpQnICdydd6LVfWNyi9o=; b=RkUP/vSgCF6nVvAy3BEfS2+4CIEAbaCD7oQubwsKOjtF/PMlbS9qkbaAVusQdBT5C3 txuws9iCH6pej7er3DDz/La/JdK+qmQgVMEFceHfrwAX/ZjRhVeXug4V4quyZ7AYEUFa HEMXNdMP7HfwI3PZzUU+2wNMR3CrLaPlIoXdbipoLG22OmOJMVNMgwg4zu6KKw+MuSzq q0EONyr5sTtdC8U4UiOVd5+ig5wV3R1uDtI0pbbuDyY0eoZDdoKIYC0O2abayW0HBwt/ 8KOoJKe13sKGbWgg8stGlw2SgC5uLXEbnOBpDNh2N2pr+HXqdfBqmZYcMlsCcsgv+4vk gfew== MIME-Version: 1.0 Received: by 10.50.140.100 with SMTP id rf4mr2277421igb.27.1352843613805; Tue, 13 Nov 2012 13:53:33 -0800 (PST) Received: by 10.64.96.202 with HTTP; Tue, 13 Nov 2012 13:53:33 -0800 (PST) In-Reply-To: References: Date: Tue, 13 Nov 2012 13:53:33 -0800 Message-ID: Subject: Re: Optimizing Disk I/O - does HDFS do anything ? From: Andy Isaacson To: user@hadoop.apache.org Content-Type: text/plain; charset=ISO-8859-1 X-Gm-Message-State: ALoCoQnRYdPGl84o5VELjDzKBSz15hn6FB5swaE4yUZ5jj+BlyIzNHj2GI99+uO00OJfHaCDyxRE X-Virus-Checked: Checked by ClamAV on apache.org On Tue, Nov 13, 2012 at 1:40 PM, Jay Vyas wrote: > 1) but I thought that this sort of thing (yes even on linux) becomes > important when you have large amounts of data - because the way files are > written can cause issues on highly packed drives. If you're running any filesystem at 99% full with a workload that creates or grows files, the filesystem will experience fragmentation. Don't do that if you want good performance. As long as there's a few dozen GB of free space to work with, ext4 on a modern Linux kernel (2.6.38 or newer) will do a fine job of keeping files sequential and shouldn't need defrag. To answer the original question -- HDFS doesn't take any special measures to enforce defragmentation, but HDFS does follow best practices to avoid causing fragmentation. -andy