Return-Path: X-Original-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id C4E8D18465 for ; Mon, 18 May 2015 11:26:00 +0000 (UTC) Received: (qmail 37026 invoked by uid 500); 18 May 2015 11:26:00 -0000 Delivered-To: apmail-hadoop-hdfs-issues-archive@hadoop.apache.org Received: (qmail 36973 invoked by uid 500); 18 May 2015 11:26:00 -0000 Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hdfs-issues@hadoop.apache.org Delivered-To: mailing list hdfs-issues@hadoop.apache.org Received: (qmail 36961 invoked by uid 99); 18 May 2015 11:26:00 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 18 May 2015 11:26:00 +0000 Date: Mon, 18 May 2015 11:26:00 +0000 (UTC) From: "J.Andreina (JIRA)" To: hdfs-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (HDFS-8234) DistributedFileSystem and Globber should apply PathFilter early MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HDFS-8234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] J.Andreina updated HDFS-8234: ----------------------------- Attachment: HDFS-8234.1.patch Attached an initial patch. Please review. > DistributedFileSystem and Globber should apply PathFilter early > --------------------------------------------------------------- > > Key: HDFS-8234 > URL: https://issues.apache.org/jira/browse/HDFS-8234 > Project: Hadoop HDFS > Issue Type: Improvement > Reporter: Rohini Palaniswamy > Assignee: J.Andreina > Labels: newbie > Attachments: HDFS-8234.1.patch > > > HDFS-985 added partial listing in listStatus to avoid listing entries of large directory in one go. If listStatus(Path p, PathFilter f) call is made, filter is applied after fetching all the entries resulting in a big list being constructed on the client side. If the DistributedFileSystem.listStatusInternal() applied the PathFilter it would be more efficient. So DistributedFileSystem should override listStatus(Path f, PathFilter filter) and apply PathFilter early. > Globber.java also applies filter after calling listStatus. It should call listStatus with the PathFilter. > {code} > FileStatus[] children = listStatus(candidate.getPath()); > ......... > for (FileStatus child : children) { > // Set the child path based on the parent path. > child.setPath(new Path(candidate.getPath(), > child.getPath().getName())); > if (globFilter.accept(child.getPath())) { > newCandidates.add(child); > } > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)