Return-Path: X-Original-To: apmail-hadoop-mapreduce-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id B4DD346D5 for ; Thu, 23 Jun 2011 06:32:19 +0000 (UTC) Received: (qmail 56131 invoked by uid 500); 23 Jun 2011 06:32:19 -0000 Delivered-To: apmail-hadoop-mapreduce-issues-archive@hadoop.apache.org Received: (qmail 55786 invoked by uid 500); 23 Jun 2011 06:32:15 -0000 Mailing-List: contact mapreduce-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: mapreduce-issues@hadoop.apache.org Delivered-To: mailing list mapreduce-issues@hadoop.apache.org Received: (qmail 55760 invoked by uid 99); 23 Jun 2011 06:32:11 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 23 Jun 2011 06:32:11 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=5.0 tests=ALL_TRUSTED,T_RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 23 Jun 2011 06:32:09 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id B9C1742A3FA for ; Thu, 23 Jun 2011 06:31:47 +0000 (UTC) Date: Thu, 23 Jun 2011 06:31:47 +0000 (UTC) From: "Amar Kamat (JIRA)" To: mapreduce-issues@hadoop.apache.org Message-ID: <424239838.32349.1308810707757.JavaMail.tomcat@hel.zones.apache.org> In-Reply-To: <982882221.32302.1308807947428.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Updated] (MAPREDUCE-2616) [Gridmix] InputStriper should smartly switch between compressed and uncompressed files based on the simulated job's input data characteristics MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/MAPREDUCE-2616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amar Kamat updated MAPREDUCE-2616: ---------------------------------- Description: Currently, all the Gridmix input data files are located at /input ( is expected as a CLI parameter). When compression emulation is enabled, Gridmix will check for compressed files (based on suffixes) in the input folder. Gridmix will bail out if there are no compressed input files. If the input folder consists of a mix of compressed and uncompressed input files, then Gridmix will only use compressed input files for all the jobs. Gridmix should smartly assign 1. uncompressed input files for jobs the don't need input decompression 2. compressed input files for jobs that need input decompression was:Currently, all the Gridmix input data files are located at /input ( is expected as a CLI parameter). When compression emulation is enabled, Gridmix will check for compressed files (based on suffixes) in the input folder. Gridmix will bail out if there are no compressed input files. If the input folder consists of a mix of compressed and uncompressed input files, then Gridmix might end up using uncompressed files resulting into no emulation. Priority: Minor (was: Major) Issue Type: Improvement (was: Bug) Summary: [Gridmix] InputStriper should smartly switch between compressed and uncompressed files based on the simulated job's input data characteristics (was: [Gridmix] Input data compression emulation might not work as expected with data reuse) MAPREDUCE-2408 modified {{InputStriper}} to take care of the previously reported issue. This is just an optimization. > [Gridmix] InputStriper should smartly switch between compressed and uncompressed files based on the simulated job's input data characteristics > ---------------------------------------------------------------------------------------------------------------------------------------------- > > Key: MAPREDUCE-2616 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-2616 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: contrib/gridmix > Affects Versions: 0.23.0 > Reporter: Amar Kamat > Assignee: Amar Kamat > Priority: Minor > Labels: compression-emulation, gridmix > Fix For: 0.23.0 > > > Currently, all the Gridmix input data files are located at /input ( is expected as a CLI parameter). When compression emulation is enabled, Gridmix will check for compressed files (based on suffixes) in the input folder. Gridmix will bail out if there are no compressed input files. If the input folder consists of a mix of compressed and uncompressed input files, then Gridmix will only use compressed input files for all the jobs. Gridmix should smartly assign > 1. uncompressed input files for jobs the don't need input decompression > 2. compressed input files for jobs that need input decompression -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira