Return-Path: X-Original-To: apmail-crunch-dev-archive@www.apache.org Delivered-To: apmail-crunch-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 2B5D6108A2 for ; Fri, 23 Aug 2013 17:14:53 +0000 (UTC) Received: (qmail 60809 invoked by uid 500); 23 Aug 2013 17:14:52 -0000 Delivered-To: apmail-crunch-dev-archive@crunch.apache.org Received: (qmail 60735 invoked by uid 500); 23 Aug 2013 17:14:52 -0000 Mailing-List: contact dev-help@crunch.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@crunch.apache.org Delivered-To: mailing list dev@crunch.apache.org Received: (qmail 60699 invoked by uid 500); 23 Aug 2013 17:14:52 -0000 Delivered-To: apmail-incubator-crunch-dev@incubator.apache.org Received: (qmail 60693 invoked by uid 99); 23 Aug 2013 17:14:52 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 23 Aug 2013 17:14:52 +0000 Date: Fri, 23 Aug 2013 17:14:51 +0000 (UTC) From: "Josh Wills (JIRA)" To: crunch-dev@incubator.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (CRUNCH-256) SequentialFileNamingScheme should cache the # of files in the target directory after the first read MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/CRUNCH-256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13748725#comment-13748725 ] Josh Wills commented on CRUNCH-256: ----------------------------------- Alright, I'll whip something up. > SequentialFileNamingScheme should cache the # of files in the target directory after the first read > --------------------------------------------------------------------------------------------------- > > Key: CRUNCH-256 > URL: https://issues.apache.org/jira/browse/CRUNCH-256 > Project: Crunch > Issue Type: Bug > Affects Versions: 0.7.0 > Reporter: Josh Wills > Assignee: Josh Wills > Fix For: 0.8.0 > > Attachments: CRUNCH-256.patch > > > After a job finishes running, the post-job hooks rename the files from a temp output directory to the target output directory. When we have lots of files, this move can take a long time, and I traced the performance issue to the fact that SequentialFileNamingScheme does a listStatus() on the output directory for every file that gets moved. If SequentialFileNamingScheme just does this check once and then increments an internal counter, we can significantly decrease the performance overhead involved with the move. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira