Return-Path: X-Original-To: apmail-hadoop-common-dev-archive@www.apache.org Delivered-To: apmail-hadoop-common-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 7994010C5A for ; Fri, 30 May 2014 22:03:27 +0000 (UTC) Received: (qmail 51588 invoked by uid 500); 30 May 2014 22:03:26 -0000 Delivered-To: apmail-hadoop-common-dev-archive@hadoop.apache.org Received: (qmail 51502 invoked by uid 500); 30 May 2014 22:03:26 -0000 Mailing-List: contact common-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: common-dev@hadoop.apache.org Delivered-To: mailing list common-dev@hadoop.apache.org Received: (qmail 51494 invoked by uid 99); 30 May 2014 22:03:25 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 30 May 2014 22:03:25 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy includes SPF record at spf.trusted-forwarder.org) Received: from [209.85.219.43] (HELO mail-oa0-f43.google.com) (209.85.219.43) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 30 May 2014 22:03:20 +0000 Received: by mail-oa0-f43.google.com with SMTP id l6so2505664oag.30 for ; Fri, 30 May 2014 15:02:59 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:sender:in-reply-to:references:date :message-id:subject:from:to:content-type; bh=hIViW68NbVPkyl666LtCVLlFFtbfIBbbOez4bY1eJB0=; b=hSFw3bdnRjrR46okae2ko1SFQiJVeJ9fC4EWe1lOOT7AfIxW9FmIheQ2/fNJtsL2+z J/idq87M1loEJcM/VSycolYziDUpPYpjXXrnoAUkr42NZW6qGFVZYuGI7qHy1IL9sJ9y lGpUl/sJm55Q3Ru4BU5wiVXq6WqfH6hk7TZ5u+zOKaXQspdjmUTRoZls7lvyQAval/NZ Kcjmrm5eS7sJgmmtlU6wzaJJW7EBx7OG9uCBdA1AaXZ6HDQU54gR8oR+JFx0+hypUCkY imRS/Git8RR2mUktDe2f+TAWrO0bfgLcxqOTGwJhLKgWabHJmDojd6q0spZK+iWCGXft Rc4Q== X-Gm-Message-State: ALoCoQlDH1uLM8okBk6HKfYkqSHUTe3UNXG96uA5QWyhW6bx3nC0pUjyb81HzWSbQgqNfQ5AZQFT MIME-Version: 1.0 X-Received: by 10.182.72.226 with SMTP id g2mr5677117obv.28.1401487379025; Fri, 30 May 2014 15:02:59 -0700 (PDT) Sender: niels@basj.es Received: by 10.76.5.170 with HTTP; Fri, 30 May 2014 15:02:58 -0700 (PDT) X-Originating-IP: [80.100.47.45] Received: by 10.76.5.170 with HTTP; Fri, 30 May 2014 15:02:58 -0700 (PDT) In-Reply-To: References: Date: Sat, 31 May 2014 00:02:58 +0200 X-Google-Sender-Auth: xkE8N83_3eHyhEoZcOvmiiLue3w Message-ID: Subject: Re: Change proposal for FileInputFormat isSplitable From: Niels Basjes To: common-dev@hadoop.apache.org Content-Type: multipart/alternative; boundary=001a11c34f44626a4a04faa5354e X-Virus-Checked: Checked by ClamAV on apache.org --001a11c34f44626a4a04faa5354e Content-Type: text/plain; charset=UTF-8 Hi, The way I see the effects of the original patch on existing subclasses: - implemented isSplitable --> no performance difference. - did not implement isSplitable --> then there is no performance difference if the container is either not compressed or uses a splittable compression. --> If it uses a common non splittable compression (like gzip) then the output will suddenly be different (which is the correct answer) and the jobs will finish sooner because the input is not processed multiple times. Where do you see a performance impact? Niels On May 30, 2014 8:06 PM, "Doug Cutting" wrote: > On Thu, May 29, 2014 at 2:47 AM, Niels Basjes wrote: > > For arguments I still do not fully understand this was rejected by Todd > and > > Doug. > > Performance is a part of compatibility. > > Doug > --001a11c34f44626a4a04faa5354e--