From user-return-341-archive-asf-public=cust-asf.ponee.io@arrow.apache.org Sat Mar 7 15:55:25 2020 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [207.244.88.153]) by mx-eu-01.ponee.io (Postfix) with SMTP id D01AC18025F for ; Sat, 7 Mar 2020 16:55:24 +0100 (CET) Received: (qmail 61628 invoked by uid 500); 7 Mar 2020 15:55:24 -0000 Mailing-List: contact user-help@arrow.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@arrow.apache.org Delivered-To: mailing list user@arrow.apache.org Received: (qmail 61618 invoked by uid 99); 7 Mar 2020 15:55:23 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 07 Mar 2020 15:55:23 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id 68EFCC0327 for ; Sat, 7 Mar 2020 15:55:23 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -1.662 X-Spam-Level: X-Spam-Status: No, score=-1.662 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-1.463, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd4-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-ec2-va.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id FSvtvrLDuW54 for ; Sat, 7 Mar 2020 15:55:21 +0000 (UTC) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=209.85.166.53; helo=mail-io1-f53.google.com; envelope-from=wesmckinn@gmail.com; receiver= Received: from mail-io1-f53.google.com (mail-io1-f53.google.com [209.85.166.53]) by mx1-ec2-va.apache.org (ASF Mail Server at mx1-ec2-va.apache.org) with ESMTPS id CB0F3BB808 for ; Sat, 7 Mar 2020 15:55:20 +0000 (UTC) Received: by mail-io1-f53.google.com with SMTP id d8so5068519ion.7 for ; Sat, 07 Mar 2020 07:55:20 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :content-transfer-encoding; bh=wGJi+J2HB16V8+C2QecuvIZqhYcGdAb+BXX8YaCHq/g=; b=Y8zf9l26cJdZamNl+826oxIcC7axvaWK7kQaPx6tO6Sg5v9WdmtGJoV253XClUssFL Xfeclwe3OctB/kAZq5x2sUcqw93TvXI6RLBTnZS2i6H6yMlahgpqVPMahdI08RWnM7+A EZx0lrB8mhmeiGuJ6AOE+xQAmZsV4EB4q+C0ALOhqNgP5oDB8tAIocZg4TYBu1Quta9S Xjf3yB5TXr/qkOQc+jADXV28RAQWdkWyveVbHIBiYheM0c+jj7v6p/uQ7mDEmYn2DDC1 CVyf8vopvj3jT9xWtAquG51cKX9JuMWa3YLN/a5uGnOPNnsyBDqxpjy9mPg2k2UxuVYB aRyg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:content-transfer-encoding; bh=wGJi+J2HB16V8+C2QecuvIZqhYcGdAb+BXX8YaCHq/g=; b=o5Y3A6emWxuh5uGYRVjbbQ3HkuJVwOOscTV0hJa/o9z9E3SiRzUAaCp8lKO1JKu0Se N4+vlT0fUMqWp0Nfrr+pKRdm5qbMHjnG6gMuntZjsriYZSPpL8UA/EdtpWLoHWfrE9s1 9stI2d7+YPFgTej5Uhy2Bp/sqZ5brZ0kr59P2bfQ1KOWSR4UdrwlJls9HaDMVaaSNo21 XCPqXymSlEvd7kMWO4PMkJuaXCOVJp4jEKoLpq6rO86K9A6X68hetNy7New6JkHadvV8 5zKjJEXODSGThNUPsRtXsizVtOYnCQXE0oL0ZOkvqck/lwhzVohspWFUBwItTpZY7GkN m/Xg== X-Gm-Message-State: ANhLgQ3h42thq4/RF5nlTDWt3mgTsmC+EzIYDeI8y/9/w2EI1VPBW2u1 gOXiAB0xuChr5iNvs1f6MuBn2d6871wjdG62YLFO/MCBv+1DAg== X-Google-Smtp-Source: ADFU+vvsOOiaLh0uoufvfnIuCLlC/5GuuAV5zzScCC/4c/A2gxdyeVhSGNi3liV1OVyRf4raNxjeQ11wL4gZIDtScD4= X-Received: by 2002:a5e:aa0a:: with SMTP id s10mr7024936ioe.14.1583596513977; Sat, 07 Mar 2020 07:55:13 -0800 (PST) MIME-Version: 1.0 References: <8524428679A18B459EA4EF0DD068CDABBAFD34FF@PWSSMTEXMBX004.AD.MLP.com> <8efccf56-944d-438f-81dd-af62e0de075a@Spark> In-Reply-To: <8efccf56-944d-438f-81dd-af62e0de075a@Spark> From: Wes McKinney Date: Sat, 7 Mar 2020 09:54:38 -0600 Message-ID: Subject: Re: Question about memoryviews and array construction To: user@arrow.apache.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable There's a couple places to start * Add PyMemoryView type check to internal::IsPyBinary https://github.com/apache/arrow/blob/master/cpp/src/arrow/python/helpers.h#= L80. I think this is all that's needed to take care of type inference * Make sure PyMemoryView is handled in the PyBytesView helper in https://github.com/apache/arrow/blob/master/cpp/src/arrow/python/common.h#L= 193 On Sat, Mar 7, 2020 at 9:35 AM Daniel Nugent wrote: > > Great! > > If you could provide a smidgen of guidance about where to start making th= is change, I would be happy to give it a shot. > > Thanks, > > -Dan Nugent > On Mar 7, 2020, 09:18 -0500, Wes McKinney , wrote: > > hi Dan, > > Yes, we should support constructing StringArray directly from > memoryview as we do with bytes and unicode -- you're the first person > to ask about this so far. I opened > https://issues.apache.org/jira/browse/ARROW-8026. This should not be a > huge amount of work so would be a good first contribution to the > project > > Thanks > > Wes > > On Fri, Mar 6, 2020 at 8:29 PM Nugent, Daniel wro= te: > > > Hi, > > > > I have a short program which I=E2=80=99m wondering about the sensibility = of. Could anyone let me know if this is reasonable or not: > > > > import pyarrow as pa, third_party_library > > > memory_views =3D third_party_library.get_strings() > > > memory_views > > > [, , , ] > > pa.array(memory_views,pa.string()) > > > Traceback (most recent call last): > > File "", line 1, in > > File "pyarrow/array.pxi", line 269, in pyarrow.lib.array > > File "pyarrow/array.pxi", line 38, in pyarrow.lib._sequence_to_array > > File "pyarrow/error.pxi", line 107, in pyarrow.lib.check_status > > pyarrow.lib.ArrowTypeError: Expected a string or bytes object, got a 'mem= oryview' object > > pa.array(map(bytes,memory_views),pa.string()) > > > > > [ > > "this", > > "is", > > "a", > > "sample" > > ] > > > > I have a big list of byte sequences being provided to me as memoryviews f= rom a third party library. I=E2=80=99d like to create an Arrow StringArray = from them as efficiently as possible. Having to map and consequently copy t= hem through a bytes constructor seems not great (and the memoryview tobytes= function appears to just call the bytes constructor, afaict). > > > > To me, it seemed like pa.array should be able to use the memoryview objec= ts directly in order to construct the StringArray, but it seems like Arrow = wants them copied into fresh byte objects first. I don=E2=80=99t know if I = understand why and was ultimately wondering if it=E2=80=99s a reasonable th= ing to desire. > > > > Thanks in advance, > > -Dan Nugent > > > > > ###################################################################### > > The information contained in this communication is confidential and > > may contain information that is privileged or exempt from disclosure > > under applicable law. If you are not a named addressee, please notify > > the sender immediately and delete this email from your system. > > If you have received this communication, and are not a named > > recipient, you are hereby notified that any dissemination, > > distribution or copying of this communication is strictly prohibited. > > ######################################################################