Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 22F63200D5F for ; Mon, 4 Dec 2017 03:04:51 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 1ACF0160C1A; Mon, 4 Dec 2017 02:04:51 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 60CD1160C0B for ; Mon, 4 Dec 2017 03:04:50 +0100 (CET) Received: (qmail 93408 invoked by uid 500); 4 Dec 2017 02:04:49 -0000 Mailing-List: contact user-help@jclouds.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@jclouds.apache.org Delivered-To: mailing list user@jclouds.apache.org Received: (qmail 93399 invoked by uid 99); 4 Dec 2017 02:04:49 -0000 Received: from mail-relay.apache.org (HELO mail-relay.apache.org) (140.211.11.15) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 04 Dec 2017 02:04:49 +0000 Received: from sherlock (61-230-73-187.dynamic-ip.hinet.net [61.230.73.187]) by mail-relay.apache.org (ASF Mail Server at mail-relay.apache.org) with ESMTPSA id 6280F1A009B for ; Mon, 4 Dec 2017 02:04:47 +0000 (UTC) Date: Mon, 4 Dec 2017 10:04:39 +0800 From: Andrew Gaul To: user@jclouds.apache.org Subject: Re: BlobStore list() and PageSet Message-ID: <20171204020439.GA4545@sherlock> References: <939145863.993776.1512316058386.JavaMail.zimbra@tech-advantage.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <939145863.993776.1512316058386.JavaMail.zimbra@tech-advantage.com> User-Agent: Mutt/1.5.21 (2010-09-15) archived-at: Mon, 04 Dec 2017 02:04:51 -0000 Unfortunately both the transient and filesystem blobstores have inefficient implementations which enumerate the entire blobstore even when returning only a subset of keys. LocalBlobStore.list calls getBlobKeysInsideContainer and filters on the output. Instead it should call into the storage strategy with the marker, limit, prefix, and delimiter and the strategy could more efficiently enumerate the keys. This is easy for the transient store if we change to a sorted map but requires some more work for filesystem since we have to issue readdir selectively. Could you open a JIRA issue for this and perhaps submit a pull request to fix it? On Sun, Dec 03, 2017 at 04:47:38PM +0100, GARDAIS Ionel wrote: > Hi, > > I have a question regarding BlobStore listing. > We are currently using the FileSystem storage implementation > > private long containerSize(String containerName) { > long containerSize = 0; > String marker = null; > PageSet containerSML = null; > ListContainerOptions lcoRecursive = new ListContainerOptions().recursive(); > > (1) containerSML = blobStore.list(containerName, lcoRecursive); > containerSize += pageSize(containerSML); > marker = containerSML.getNextMarker(); > > while (marker != null) { > lcoRecursive.afterMarker(marker); > (2) containerSML = blobStore.list(containerName, lcoRecursive); > containerSize += pageSize(containerSML); > marker = containerSML.getNextMarker(); > } > > log.debug("container:{} size:{}", containerName, containerSize); > return containerSize; > } > private long pageSize(PageSet containerStorageMetadataList) { > long size = 0; > for (StorageMetadata containerStorageMetadata : containerStorageMetadataList) { > if (containerStorageMetadata.getType() == StorageType.BLOB) { > log.debug("name:{} size:{}", containerStorageMetadata.getName(), containerStorageMetadata.getSize()); > size += containerStorageMetadata.getSize(); > } else { > log.debug("other in container: {}", containerStorageMetadata.getType()); > } > } > return size; > } > > > By calling list() with ListContainerOptions().recursive() on both (1) and (2), all container’s blobs are opened every PageSet. > I tried to call list() with recursive() on (1) only then iterating over PageSets without the recursive flag set but I don’t get all Blobs. > > Is there a way to iterate over all blobs in a container in a more effective way ? > > Thanks, > Ionel > > -- > > 232 avenue Napoleon BONAPARTE 92500 RUEIL MALMAISON > > Capital EUR 219 300,00 - RCS Nanterre B 408 832 301 - TVA FR 09 408 832 301 > > BEGIN:VCARD > VERSION:3.0 > FN:GARDAIS\, Ionel > N:GARDAIS;Ionel;;; > ADR;TYPE=work,postal,parcel:;;232 avenue Napol??on BONAPARTE;RUEIL MALMAISON;IdF;92500;FRANCE > TEL;TYPE=work,voice:0147088131 > EMAIL;TYPE=internet:ionel.gardais@tech-advantage.com > URL;TYPE=work:http://www.techad.fr > ORG:TECH advantage > TITLE:CIO > REV:2017-09-28T12:56:01Z > UID:5a4525af-5d0c-4a32-a77b-c565580b116e:114277 > END:VCARD -- Andrew Gaul http://gaul.org/