Return-Path: X-Original-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 2B163F829 for ; Thu, 28 Mar 2013 16:12:15 +0000 (UTC) Received: (qmail 36436 invoked by uid 500); 28 Mar 2013 16:10:53 -0000 Delivered-To: apmail-hadoop-mapreduce-user-archive@hadoop.apache.org Received: (qmail 34216 invoked by uid 500); 28 Mar 2013 16:10:47 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 24261 invoked by uid 99); 28 Mar 2013 16:03:13 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 28 Mar 2013 16:03:13 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: local policy) Received: from [209.85.220.171] (HELO mail-vc0-f171.google.com) (209.85.220.171) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 28 Mar 2013 16:03:07 +0000 Received: by mail-vc0-f171.google.com with SMTP id ha11so7781998vcb.2 for ; Thu, 28 Mar 2013 09:02:46 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=x-received:mime-version:in-reply-to:references:from:date:message-id :subject:to:content-type:content-transfer-encoding :x-gm-message-state; bh=lzSsuDBUGVz32jncQ7P15dhHdz74o7zG42KL4HTmg5w=; b=o5rlYzMAusojm+5BBTNgOTVMGs2wA2/X0VuFbkO9L+GfAZ4gzbk4Tuo3b5O5Z/LtHs 5XQt4FyRfc22wcNCSTXsusuB93NhP9PnX9s4UGak5WEbuDuki+eoZPG3l20dy68pefCA JugVnzZ0w96GLjLQwJRl2RQjFuqINwRnxXWZxbAkAJ9GvMEoqunoJrFVFs1KH8fzddYj aqgtMhbuY7jqZ7WIZs3LWg0GdmNQMDlixwoDcDOohv5m/9lpXslae+aJ6htKNxdpOnR5 SXEsEmfhlkzmZJ3sUEYbUujabuhpEpd0Sn84G5ZVIWEvASICaBz63QrpwQQjLTrvTnZa R6DA== X-Received: by 10.52.68.235 with SMTP id z11mr23232114vdt.107.1364486566390; Thu, 28 Mar 2013 09:02:46 -0700 (PDT) MIME-Version: 1.0 Received: by 10.52.248.176 with HTTP; Thu, 28 Mar 2013 09:02:26 -0700 (PDT) In-Reply-To: References: <5BDD5440-304F-44C9-B512-56EDAD21BC39@apache.org> <4265D509-D8B9-48AA-B7B6-DBC19CA807CA@yahoo-inc.com> From: Jean-Marc Spaggiari Date: Thu, 28 Mar 2013 12:02:26 -0400 Message-ID: Subject: Re: Auto clean DistCache? To: user@hadoop.apache.org Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Gm-Message-State: ALoCoQkpqf5C/E4mlCsqpD9BgpqRZooV/Qq1gSBBXJ63BfI2q+cs0OQDVeItpcl88PQx5EtqD8eP X-Virus-Checked: Checked by ClamAV on apache.org Thanks Harsh. My issue was not related to the number of files/folders but related to the total size of the DistributedCache. The directory where it's stored only has 7GB available... So I will setup the limit to 5GB with local.cache.size, or move it to the drives there I have the dfs files stored. Thanks, JM 2013/3/28 Harsh J : > The DistributedCache is cleaned automatically and no user intervention > (aside of size limitation changes, which may be an administrative > requirement) is generally required to delete the older distributed > cache files. > > This is observable in code and is also noted in TDG, 2ed.: > > Tom White: > """ > The tasktracker also maintains a reference count for the number of > tasks using each file in the cache. Before the task has run, the > file=E2=80=99s reference count is incremented by one; then after the task= has > run, the count is decreased by one. Only when the count reaches zero > it is eligible for deletion, since no tasks are using it. Files are > deleted to make room for a new file when the cache exceeds a certain > size=E2=80=9410 GB by default. The cache size may be changed by setting t= he > configuration property local.cache.size, which is measured in bytes. > """ > > And also, the maximum allowed dirs is also checked for automatically > today, to not violate the OS's limits. > > On Wed, Mar 27, 2013 at 7:07 PM, Jean-Marc Spaggiari > wrote: >> Oh! good to know! It keep tracks even of month old entries??? There is n= o TTL? >> >> I was not able to find the documentation for local.cache.size or >> mapreduce.tasktracker.cache.local.size in 1.0.x branch. Do you know >> where I can found that? >> >> Thanks, >> >> JM >> >> 2013/3/27 Koji Noguchi : >>>> Else, I will go for a customed script to delete all directories (and c= ontent) older than 2 or 3 days=E2=80=A6 >>>> >>> TaskTracker (or NodeManager in 2.*) keeps the list of dist cache entrie= s in memory. >>> So if external process (like your script) start deleting dist cache fil= es, there would be inconsistency and you'll start seeing task initializatio= n failures due to no file found error. >>> >>> Koji >>> >>> >>> On Mar 26, 2013, at 9:00 PM, Jean-Marc Spaggiari wrote: >>> >>>> For the situation I faced I was really a disk space issue, not related >>>> to the number of files. It was writing on a small partition. >>>> >>>> I will try with local.cache.size or >>>> mapreduce.tasktracker.cache.local.size to see if I can keep the final >>>> total size under 5GB... Else, I will go for a customed script to >>>> delete all directories (and content) older than 2 or 3 days... >>>> >>>> Thanks, >>>> >>>> JM >>>> >>>> 2013/3/26 Abdelrahman Shettia : >>>>> Let me clarify , If there are lots of files or directories up to 32K = ( >>>>> Depending on the user's # of files sys os config) in those distribute= d cache >>>>> dirs, The OS will not be able to create any more files/dirs, Thus M-R= jobs >>>>> wont get initiated on those tasktracker machines. Hope this helps. >>>>> >>>>> >>>>> Thanks >>>>> >>>>> >>>>> On Tue, Mar 26, 2013 at 1:44 PM, Vinod Kumar Vavilapalli >>>>> wrote: >>>>>> >>>>>> >>>>>> All the files are not opened at the same time ever, so you shouldn't= see >>>>>> any "# of open files exceeds error". >>>>>> >>>>>> Thanks, >>>>>> +Vinod Kumar Vavilapalli >>>>>> Hortonworks Inc. >>>>>> http://hortonworks.com/ >>>>>> >>>>>> On Mar 26, 2013, at 12:53 PM, Abdelrhman Shettia wrote: >>>>>> >>>>>> Hi JM , >>>>>> >>>>>> Actually these dirs need to be purged by a script that keeps the las= t 2 >>>>>> days worth of files, Otherwise you may run into # of open files exce= eds >>>>>> error. >>>>>> >>>>>> Thanks >>>>>> >>>>>> >>>>>> On Mar 25, 2013, at 5:16 PM, Jean-Marc Spaggiari >>>>>> wrote: >>>>>> >>>>>> Hi, >>>>>> >>>>>> >>>>>> Each time my MR job is run, a directory is created on the TaskTracke= r >>>>>> >>>>>> under mapred/local/taskTracker/hadoop/distcache (based on my >>>>>> >>>>>> configuration). >>>>>> >>>>>> >>>>>> I looked at the directory today, and it's hosting thousands of >>>>>> >>>>>> directories and more than 8GB of data there. >>>>>> >>>>>> >>>>>> Is there a way to automatically delete this directory when the job i= s >>>>>> done? >>>>>> >>>>>> >>>>>> Thanks, >>>>>> >>>>>> >>>>>> JM >>>>>> >>>>>> >>>>>> >>>>> >>> > > > > -- > Harsh J