From user-return-1047-archive-asf-public=cust-asf.ponee.io@arrow.apache.org Tue Mar 2 17:49:54 2021 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mxout1-he-de.apache.org (mxout1-he-de.apache.org [95.216.194.37]) by mx-eu-01.ponee.io (Postfix) with ESMTPS id 60083180630 for ; Tue, 2 Mar 2021 18:49:54 +0100 (CET) Received: from mail.apache.org (mailroute1-lw-us.apache.org [207.244.88.153]) by mxout1-he-de.apache.org (ASF Mail Server at mxout1-he-de.apache.org) with SMTP id 61BF064607 for ; Tue, 2 Mar 2021 17:49:53 +0000 (UTC) Received: (qmail 4191 invoked by uid 500); 2 Mar 2021 17:49:52 -0000 Mailing-List: contact user-help@arrow.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@arrow.apache.org Delivered-To: mailing list user@arrow.apache.org Received: (qmail 4181 invoked by uid 99); 2 Mar 2021 17:49:52 -0000 Received: from spamproc1-he-fi.apache.org (HELO spamproc1-he-fi.apache.org) (95.217.134.168) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 02 Mar 2021 17:49:52 +0000 Received: from localhost (localhost [127.0.0.1]) by spamproc1-he-fi.apache.org (ASF Mail Server at spamproc1-he-fi.apache.org) with ESMTP id AAD0DC033D for ; Tue, 2 Mar 2021 17:49:51 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamproc1-he-fi.apache.org X-Spam-Flag: NO X-Spam-Score: 0.999 X-Spam-Level: X-Spam-Status: No, score=0.999 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_REPLY=1, HTML_MESSAGE=0.2, RCVD_IN_MSPIKE_H2=-0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamproc1-he-fi.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-ec2-va.apache.org ([116.203.227.195]) by localhost (spamproc1-he-fi.apache.org [95.217.134.168]) (amavisd-new, port 10024) with ESMTP id ET2cgVRfd6Be for ; Tue, 2 Mar 2021 17:49:50 +0000 (UTC) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=209.85.217.46; helo=mail-vs1-f46.google.com; envelope-from=sshleifer@gmail.com; receiver= Received: from mail-vs1-f46.google.com (mail-vs1-f46.google.com [209.85.217.46]) by mx1-ec2-va.apache.org (ASF Mail Server at mx1-ec2-va.apache.org) with ESMTPS id 67A6ABCC48 for ; Tue, 2 Mar 2021 17:49:50 +0000 (UTC) Received: by mail-vs1-f46.google.com with SMTP id p24so1993000vsj.13 for ; Tue, 02 Mar 2021 09:49:50 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:date:message-id:subject:to:in-reply-to:from; bh=p1HrloqbOzhZlOnTulC/mo/leiASyVgVhJwxcoZyWAI=; b=lKPlPZgUCRMTLHMh57S+IJ5JXzPOEp3aVgTz+wc0qpfCu09YfIhaaBHjbzurTtDCx+ iJ0whqXmt8i7JD2CNo9n0nmXzioYCpkow5Y+is8ENNdMc49UiLHgraT9WgBbuNvy+ZhM Vj33sg2VpIyiAG74sMa6HbAV0DKbtnxSrfmDwtBwZLRrZzlPQ7cJDZ0p3LViHaEZF9SQ n1VLeyv+f4oPnrHIR3doRFxmF9pu94nWqfQLPjmKK63GzFUpsilKDAqSYgnFPZIiU4aB YF/sJpY54gCovVt+0CiSgyPTsjj1frCiO505nPfAZ32jlQZ0f33y8ZvaqRhOEYh/8W07 JKyQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:date:message-id:subject :to:in-reply-to:from; bh=p1HrloqbOzhZlOnTulC/mo/leiASyVgVhJwxcoZyWAI=; b=ot+A3TqesoPWyoPLTSoFROkir1o20BgIevjIp9k+tAqy4H5adCw8VdDolK5FtsST3q azApqznKvhaRFydfVRB7IFKzTbMkQ2q6RJUUA/8bspbpBoHH2FO5AywH/AajmuLWg5mq 9b+Aw84uJMwIQ6f+FiSQ4sI/4Haj5ltxXOAjsdK8TBLl3y8jWjMy8jYFrfK87T4O271g bPhsvFBMEiS2QagnVFGfWBuLGCf7IKTyxPI4fNv+0p9PTRKq5Ae6qfJLb/6qV24ZLfpC 81oXPq7NS/JYlSmVqpmzhIUxLy9bHyO/kiNYgBzmqbKokTgCVYvSUu3R4D39VJftyS9w C/qQ== X-Gm-Message-State: AOAM533oEUdlzYi4Dql75DdO2u95tLXkKWN03VLN2JEKPouMZzu6R2La +t1PgneNvNqmzTiPRrbqJ3tVzuFzkfM= X-Google-Smtp-Source: ABdhPJxoK/FbB5vUGdw+b12TnXlJD754RehVSkMvpWe0539syGA/mXWG/veYawIDjNm4eeH83bmHWQ== X-Received: by 2002:a67:e947:: with SMTP id p7mr3252852vso.59.1614707383400; Tue, 02 Mar 2021 09:49:43 -0800 (PST) Received: from localhost (0.92.231.35.bc.googleusercontent.com. [35.231.92.0]) by smtp.gmail.com with ESMTPSA id s6sm1948669vka.16.2021.03.02.09.49.43 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 02 Mar 2021 09:49:43 -0800 (PST) Mime-Version: 1.0 References: Date: Tue, 02 Mar 2021 17:49:41 +0000 Message-ID: Subject: Re: Python Plasma Store Best Practices To: user@arrow.apache.org, "Micah Kornfield" X-Superhuman-Draft-ID: draft009421e5c9a9198f X-Superhuman-ID: klsb2lg2.57b62484-028d-4b7c-bc98-0712fb1f92d3 In-Reply-To: From: "Sam Shleifer" X-Mailer: Superhuman Desktop (2021-03-01T23:06:07Z) Content-Type: multipart/alternative; boundary=8d0ddd2bec8eb80d68e936090d773d186f6b6ada258ad3267690f8745a5b --8d0ddd2bec8eb80d68e936090d773d186f6b6ada258ad3267690f8745a5b Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=UTF-8 Thanks, had no idea! On Tue, Mar 02, 2021 at 12:00 PM, Micah Kornfield < emkornfield@gmail.com >= wrote: >=20 > Hi Sam, > I think the lack of responses might be because Plasma is not being > actively maintained.=C2=A0 The original authors have forked it into the R= ay > project. >=20 >=20 > I'm sorry I don't have the expertise to answer your questions. >=20 >=20 > -Micah >=20 > On Mon, Mar 1, 2021 at 6:48 PM Sam Shleifer < sshleifer@ gmail. com ( > sshleifer@gmail.com ) > wrote: >=20 >=20 >> Partial answers are super=C2=A0helpful! >>=20 >> I'm happy to break this up if it's too much for 1 question @moderators >>=20 >> Sam >>=20 >>=20 >>=20 >>=20 >>=20 >>=20 >> On Sat, Feb 27, 2021 at 1:27 PM, Sam Shleifer < sshleifer@ gmail. com ( >> sshleifer@gmail.com ) > wrote: >>=20 >>> Hi! >>>=20 >>> I am trying to use plasma store to reduce the memory usage of a pytorch >>> dataset/dataloader combination, and had 4=C2=A0 questions. I don=E2=80= =99t think any of >>> them require pytorch knowledge. If you prefer to comment inline there i= s a >>> quip with identical content and prettier formatting here https:/ / quip= . com/ >>> 3mwGAJ9KR2HT ( https://quip.com/3mwGAJ9KR2HT ) >>>=20 >>>=20 >>>=20 >>> *1)* My script starts the plasma-store from python with 200 GB: >>>=20 >>>=20 >>>=20 >>> nbytes =3D (1024 ** 3) * 200 >>>=20 >>> _server =3D subprocess.Popen(["plasma_store", "-m", str(nbytes), "-s", >>> path]) >>>=20 >>> where nbytes is chosen arbitrarily. From my experiments it seems that o= ne >>> should start the store as large as possible within the limits of dev/sh= m . >>> I wanted to verify whether this is actually the best practice (it would= be >>> hard for my app to know the storage needs up front) and also whether th= ere >>> is an automated way to figure out how much storage to allocate. >>>=20 >>>=20 >>>=20 >>> *2)* Does plasma store support simultaneous reads? My code, which has >>> multiple clients all asking for the 6 arrays from the plasma-store >>> thousands of times, was segfaulting with different errors, e.g. >>>=20 >>> Check failed: RemoveFromClientObjectIds(object_id, entry, client) =3D= =3D 1 >>>=20 >>> until I added a lock around my client.get >>>=20 >>>=20 >>>=20 >>> if self.use_lock: # Fix segfault >>>=20 >>> with FileLock("/tmp/plasma_lock"): >>>=20 >>> ret =3D self.client.get(self.object_id) >>>=20 >>> else: >>>=20 >>> ret =3D self.client.get(self.object_id) >>>=20 >>>=20 >>>=20 >>> which fixes. >>>=20 >>>=20 >>>=20 >>> Here is a full traceback of the failure without the lock https:/ / gist= . github. >>> com/ sshleifer/ 75145ba828fcb4e998d5e34c46ce13fc ( >>> https://gist.github.com/sshleifer/75145ba828fcb4e998d5e34c46ce13fc ) >>>=20 >>> Is this expected behavior? >>>=20 >>>=20 >>>=20 >>> *3)* Is there a simple way to add many objects to the plasma store at >>> once? Right now, we are considering changing, >>>=20 >>>=20 >>>=20 >>> oid =3D client.put(array) >>>=20 >>> to >>>=20 >>> oids =3D [client.put(x) for x in array] >>>=20 >>>=20 >>>=20 >>> so that we can fetch one entry at a time. but the writes are much slowe= r. >>>=20 >>>=20 >>>=20 >>> * 3a) Is there a lower level interface for bulk writes? >>>=20 >>> * 3b) Or is it recommended to chunk the array and have different python >>> processes write simultaneously to make this faster? >>>=20 >>>=20 >>>=20 >>> *4)* Is there a way to save/load the contents of the plasma-store to di= sk >>> without loading everything into memory and then saving it to some other >>> format? >>>=20 >>>=20 >>>=20 >>> Replication >>>=20 >>>=20 >>>=20 >>> Setup instructions for fairseq+replicating the segfault: https:/ / gist= . github. >>> com/ sshleifer/ bd6982b3f632f1d4bcefc9feceb30b1a ( >>> https://gist.github.com/sshleifer/bd6982b3f632f1d4bcefc9feceb30b1a ) >>>=20 >>> My code is here: https:/ / github. com/ pytorch/ fairseq/ pull/ 3287 ( >>> https://github.com/pytorch/fairseq/pull/3287 ) >>>=20 >>>=20 >>>=20 >>> Thanks! >>>=20 >>> Sam >>>=20 >>=20 >>=20 >=20 > --8d0ddd2bec8eb80d68e936090d773d186f6b6ada258ad3267690f8745a5b Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=UTF-8
Thanks, had no idea!
3D"


On Tue, Mar 02, 2021 at 12:00 PM, Micah Kornfield <= span dir=3D"ltr"><emkornfield@gmail.com> wrote:
Hi Sam,
I think the la= ck of responses might be because Plasma is not being actively maintained.= =C2=A0 The original authors have forked it into the Ray project.=C2=A0

I'm sorry I don't have the expertise to answer your = questions.

-Micah

On Mon, Mar 1, 2021 at 6:48 PM Sam Shleifer <s= shleifer@gmail.com> wrote:
Partial answers are super=C2=A0helpful!
I'm happy to break this up i= f it's too much for 1 question @moderators=C2=A0
Sam



On Sat, Feb= 27, 2021 at 1:27 PM, Sam Shleifer <sshleifer@gmail.com> wr= ote:
=
Hi!
I am trying to use plasma store to reduce the memory us= age of a pytorch dataset/dataloader combination, and had 4=C2=A0 questions.= I don=E2=80=99t think any of them require pytorch knowledge. If you prefer= to comment inline there is a quip with identical content and prettier form= atting here https://quip.com/3mwGAJ9KR= 2HT

*1)* My script starts the plasma-store from= python with 200 GB:

=
nbytes =3D (1024 ** 3) * 200
_server =3D subprocess.Pope= n(["plasma_store", "-m", str(nbytes), "-s", path])<= br/>
where nbytes is chosen arb= itrarily. From my experiments it seems that one should start the store as l= arge as possible within the limits of dev/shm . I wanted to verify whether = this is actually the best practice (it would be hard for my app to know the= storage needs up front) and also whether there is an automated way to figu= re out how much storage to allocate.

*2)* Does plas= ma store support simultaneous reads? My code, which has multiple clients al= l asking for the 6 arrays from the plasma-store thousands of times, was seg= faulting with different errors, e.g.
Check failed: RemoveFromClientObjectIds(object_id, entry, client= ) =3D=3D 1
until I added a= lock around my client.get

if self.use_lock: # Fix = segfault
=C2=A0=C2=A0=C2= =A0 with FileLock("/tmp/plasma_lock"):
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 ret =3D sel= f.client.get(self.object_id)
else:
=C2=A0=C2=A0=C2= =A0 ret =3D self.client.get(self.object_id)

which f= ixes.

Here is a full traceback of the failure witho= ut the lock https:/= /gist.github.com/sshleifer/75145ba828fcb4e998= d5e34c46ce13fc
Is this= expected behavior?

<= /div>
*3)* Is there a simple way to a= dd many objects to the plasma store at once? Right now, we are considering = changing,

oid =3D client.put(array)
to
oids =3D [client.put(x) for x in array]

= so that we can fetch one entry at a time. but the writes are much slower.

* 3a) Is there a lower level interface for bulk writ= es?
* 3b) Or is it recomme= nded to chunk the array and have different python processes write simultane= ously to make this faster?

*4)* Is there a way to s= ave/load the contents of the plasma-store to disk without loading everythin= g into memory and then saving it to some other format?

Replication

Setup instructions for fairseq+repl= icating the segfault:=C2=A0https://gist.github.com/sshleifer/bd6= 982b3f632f1d4bcefc9feceb30b1a

<= /body> --8d0ddd2bec8eb80d68e936090d773d186f6b6ada258ad3267690f8745a5b--