Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 7A30F200D27 for ; Wed, 25 Oct 2017 14:03:08 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 7889E1609E5; Wed, 25 Oct 2017 12:03:08 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id BB943160BDA for ; Wed, 25 Oct 2017 14:03:07 +0200 (CEST) Received: (qmail 82236 invoked by uid 500); 25 Oct 2017 12:03:06 -0000 Mailing-List: contact common-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list common-issues@hadoop.apache.org Received: (qmail 82225 invoked by uid 99); 25 Oct 2017 12:03:06 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 25 Oct 2017 12:03:06 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 2B22C1A1335 for ; Wed, 25 Oct 2017 12:03:06 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -100.002 X-Spam-Level: X-Spam-Status: No, score=-100.002 tagged_above=-999 required=6.31 tests=[RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id mQBWYsxPzdZk for ; Wed, 25 Oct 2017 12:03:05 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTP id A53395FDE8 for ; Wed, 25 Oct 2017 12:03:04 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id E3600E0D60 for ; Wed, 25 Oct 2017 12:03:03 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id A64C621305 for ; Wed, 25 Oct 2017 12:03:00 +0000 (UTC) Date: Wed, 25 Oct 2017 12:03:00 +0000 (UTC) From: "Steve Loughran (JIRA)" To: common-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (HADOOP-14965) s3a input stream "normal" fadvise mode to be adaptive MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Wed, 25 Oct 2017 12:03:08 -0000 [ https://issues.apache.org/jira/browse/HADOOP-14965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated HADOOP-14965: ------------------------------------ Status: Patch Available (was: Open) > s3a input stream "normal" fadvise mode to be adaptive > ----------------------------------------------------- > > Key: HADOOP-14965 > URL: https://issues.apache.org/jira/browse/HADOOP-14965 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 > Affects Versions: 2.8.1 > Reporter: Steve Loughran > Assignee: Steve Loughran > Attachments: HADOOP-14965-001.patch > > > HADOOP-14535 added seek optimisation to wasb, but rather than require the caller to declare sequential vs random, it works out for itself. > # defaults to sequential, lazy seek > # if the caller ever seeks backwards, switches to random IO. > This means that on the use pattern of columnar stores: of go to end of file, read summary, then go to columns and work forwards, will switch to random IO after that first seek back (cost: one aborted HTTP connection)/. > Where this should benefit the most is in downstream apps where you are working with different data sources in the same object store/running of the same app config, but have different read patterns. I'm seeing exactly this in some of my spark tests, where it's near impossible to set things up so that .gz files are read sequentially, but ORC data is read in random IO > I propose the "normal" fadvise => adaptive, sequential==sequential always, random => random from the outset. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org For additional commands, e-mail: common-issues-help@hadoop.apache.org