Return-Path: X-Original-To: apmail-hadoop-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id A244DF84E for ; Thu, 28 Mar 2013 06:34:31 +0000 (UTC) Received: (qmail 90782 invoked by uid 500); 28 Mar 2013 06:34:26 -0000 Delivered-To: apmail-hadoop-user-archive@hadoop.apache.org Received: (qmail 90549 invoked by uid 500); 28 Mar 2013 06:34:26 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 90527 invoked by uid 99); 28 Mar 2013 06:34:25 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 28 Mar 2013 06:34:25 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of harsh@cloudera.com designates 209.85.210.177 as permitted sender) Received: from [209.85.210.177] (HELO mail-ia0-f177.google.com) (209.85.210.177) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 28 Mar 2013 06:34:21 +0000 Received: by mail-ia0-f177.google.com with SMTP id w33so4930096iag.22 for ; Wed, 27 Mar 2013 23:34:01 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=x-received:mime-version:in-reply-to:references:from:date:message-id :subject:to:content-type:content-transfer-encoding :x-gm-message-state; bh=dMyPsLa3fkw1UxmxvPxT/Ck0V7Oxky6pxYSeuz+2nOE=; b=QiqWJFdQtu3Sj4/8RbinxsUEnxx7vmUzBfyiqsUbyyqNsQqI+gz7NPD2eaFZ4vnNKX whopn6Epx9vbD4Hj/N6Lw2++ZtQ6JGqzkY5nmnJDXNfC+Sn4w1xlSs3IuuEGRMV8kKho IaqYsZcSaKzkIHgpS4n05ZnDvy7bUo1MieAYaRmCgEwbiQKugCflhWL4s74Iy/U+HJi6 GqNK20ZCALWShqRxjV/zIE99co2kNWIcuj90/fU8x5vZ5Z46A0xE4ZQ9oPnkPNn0TVUz SFq2GfkqMQazRcJFzthzMAPNubgp2AkKMSogTqrgdr/7uIo1fNswc1AhrKrvPz527vS7 094g== X-Received: by 10.50.106.114 with SMTP id gt18mr6253473igb.23.1364452441271; Wed, 27 Mar 2013 23:34:01 -0700 (PDT) MIME-Version: 1.0 Received: by 10.50.135.37 with HTTP; Wed, 27 Mar 2013 23:33:40 -0700 (PDT) In-Reply-To: References: <5BDD5440-304F-44C9-B512-56EDAD21BC39@apache.org> <4265D509-D8B9-48AA-B7B6-DBC19CA807CA@yahoo-inc.com> From: Harsh J Date: Thu, 28 Mar 2013 12:03:40 +0530 Message-ID: Subject: Re: Auto clean DistCache? To: "" Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable X-Gm-Message-State: ALoCoQlQcmhmLEdJqehWT8wO1A6T6/1pOmg5bzSbm5UPS2gr14mS02kRB0ILR3sdkFxUlWZw16Ut X-Virus-Checked: Checked by ClamAV on apache.org The DistributedCache is cleaned automatically and no user intervention (aside of size limitation changes, which may be an administrative requirement) is generally required to delete the older distributed cache files. This is observable in code and is also noted in TDG, 2ed.: Tom White: """ The tasktracker also maintains a reference count for the number of tasks using each file in the cache. Before the task has run, the file=92s reference count is incremented by one; then after the task has run, the count is decreased by one. Only when the count reaches zero it is eligible for deletion, since no tasks are using it. Files are deleted to make room for a new file when the cache exceeds a certain size=9710 GB by default. The cache size may be changed by setting the configuration property local.cache.size, which is measured in bytes. """ And also, the maximum allowed dirs is also checked for automatically today, to not violate the OS's limits. On Wed, Mar 27, 2013 at 7:07 PM, Jean-Marc Spaggiari wrote: > Oh! good to know! It keep tracks even of month old entries??? There is no= TTL? > > I was not able to find the documentation for local.cache.size or > mapreduce.tasktracker.cache.local.size in 1.0.x branch. Do you know > where I can found that? > > Thanks, > > JM > > 2013/3/27 Koji Noguchi : >>> Else, I will go for a customed script to delete all directories (and co= ntent) older than 2 or 3 days=85 >>> >> TaskTracker (or NodeManager in 2.*) keeps the list of dist cache entries= in memory. >> So if external process (like your script) start deleting dist cache file= s, there would be inconsistency and you'll start seeing task initialization= failures due to no file found error. >> >> Koji >> >> >> On Mar 26, 2013, at 9:00 PM, Jean-Marc Spaggiari wrote: >> >>> For the situation I faced I was really a disk space issue, not related >>> to the number of files. It was writing on a small partition. >>> >>> I will try with local.cache.size or >>> mapreduce.tasktracker.cache.local.size to see if I can keep the final >>> total size under 5GB... Else, I will go for a customed script to >>> delete all directories (and content) older than 2 or 3 days... >>> >>> Thanks, >>> >>> JM >>> >>> 2013/3/26 Abdelrahman Shettia : >>>> Let me clarify , If there are lots of files or directories up to 32K ( >>>> Depending on the user's # of files sys os config) in those distributed= cache >>>> dirs, The OS will not be able to create any more files/dirs, Thus M-R = jobs >>>> wont get initiated on those tasktracker machines. Hope this helps. >>>> >>>> >>>> Thanks >>>> >>>> >>>> On Tue, Mar 26, 2013 at 1:44 PM, Vinod Kumar Vavilapalli >>>> wrote: >>>>> >>>>> >>>>> All the files are not opened at the same time ever, so you shouldn't = see >>>>> any "# of open files exceeds error". >>>>> >>>>> Thanks, >>>>> +Vinod Kumar Vavilapalli >>>>> Hortonworks Inc. >>>>> http://hortonworks.com/ >>>>> >>>>> On Mar 26, 2013, at 12:53 PM, Abdelrhman Shettia wrote: >>>>> >>>>> Hi JM , >>>>> >>>>> Actually these dirs need to be purged by a script that keeps the last= 2 >>>>> days worth of files, Otherwise you may run into # of open files excee= ds >>>>> error. >>>>> >>>>> Thanks >>>>> >>>>> >>>>> On Mar 25, 2013, at 5:16 PM, Jean-Marc Spaggiari >>>>> wrote: >>>>> >>>>> Hi, >>>>> >>>>> >>>>> Each time my MR job is run, a directory is created on the TaskTracker >>>>> >>>>> under mapred/local/taskTracker/hadoop/distcache (based on my >>>>> >>>>> configuration). >>>>> >>>>> >>>>> I looked at the directory today, and it's hosting thousands of >>>>> >>>>> directories and more than 8GB of data there. >>>>> >>>>> >>>>> Is there a way to automatically delete this directory when the job is >>>>> done? >>>>> >>>>> >>>>> Thanks, >>>>> >>>>> >>>>> JM >>>>> >>>>> >>>>> >>>> >> --=20 Harsh J