From user-return-299-archive-asf-public=cust-asf.ponee.io@arrow.apache.org Tue Jan 28 07:43:35 2020 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [207.244.88.153]) by mx-eu-01.ponee.io (Postfix) with SMTP id 33AAE180658 for ; Tue, 28 Jan 2020 08:43:35 +0100 (CET) Received: (qmail 77685 invoked by uid 500); 28 Jan 2020 07:43:34 -0000 Mailing-List: contact user-help@arrow.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@arrow.apache.org Delivered-To: mailing list user@arrow.apache.org Received: (qmail 77675 invoked by uid 99); 28 Jan 2020 07:43:34 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 28 Jan 2020 07:43:34 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id DADC0C11D5 for ; Tue, 28 Jan 2020 07:43:33 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 0.501 X-Spam-Level: X-Spam-Status: No, score=0.501 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, HTML_MESSAGE=0.2, KAM_NUMSUBJECT=0.5, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd1-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-he-de.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id ebfZ4gBndc-Q for ; Tue, 28 Jan 2020 07:43:31 +0000 (UTC) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=2607:f8b0:4864:20::132; helo=mail-il1-x132.google.com; envelope-from=jorisvandenbossche@gmail.com; receiver= Received: from mail-il1-x132.google.com (mail-il1-x132.google.com [IPv6:2607:f8b0:4864:20::132]) by mx1-he-de.apache.org (ASF Mail Server at mx1-he-de.apache.org) with ESMTPS id 699937DD96 for ; Tue, 28 Jan 2020 07:43:30 +0000 (UTC) Received: by mail-il1-x132.google.com with SMTP id t17so9796719ilm.13 for ; Mon, 27 Jan 2020 23:43:30 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to; bh=GS2+tIB72JJFXH6Fghkmjp9taeL+xVyu/29sDF73/fw=; b=EUQ4jE5bn1cveUddoUUYzifbX9AHJOrWJITSm6KcWqF5p9CFNXuUDD5uJsTLTUBGPM E1McIbWfRw/Ip9K3+l2HjCm/H62wqPq/G/dh2mDW4hftOwJ051WGzssS5zQcsdxMl34p +QQiw6dte2xOfyDkg2QNUoIUIYRaIcr5eDjhRtoAP6jEGUoZWmarWyvbEbpgOEgxCQpU fcZTw7KPnOJRX4gBTcByBPUCwItBt0I1+lVvyDJHYFjBwp/wtrGA10FHi4yP2zqKO0Tp bYv2+XDfAKluUXYoM3dIFiCiA0IIwt1U+kGKyjGDHP1IPjNafGrzxQ9NIVoVxNgCih3g suSQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to; bh=GS2+tIB72JJFXH6Fghkmjp9taeL+xVyu/29sDF73/fw=; b=aWwyzobGxzmfTpunDVSWFsAmb32HL5hF+vUTv1ThcbzbqiA/1ci4GCktDoJF6mgQqD FdFhlSmdCisTWQG5RqqSsy7Pvc8zpLAhwuZ7OXZ7s0h3atPRQn0hbOHMftQPz9c1dLV/ vLtvrL1jRf1TKngBJXhiU1wkoVVpq5hfBDRfhIR+dNCEst9qT5snHMFnzMLKMjYc2lwB 2spxTOvbDgwt/JtYh6SEjnLez1NIwlkXdMt7hs99aHrxbyWsTCGNdWdkmF1mbWKZNaro 0lb4PycOpbPLtfIYQev0Vep70D9wlTQtRja4kcUhxgJURcK1My60GJQ67cCzSCfv0Ys+ voYQ== X-Gm-Message-State: APjAAAViuS2M2+LffOhLXAUqGcMLVLz6R9nSmrC1uHYqKF5drdwq25T4 itJSQZXQvHL6zFhvo3rwHt4o/78Aju9EqJEajXJt3lpDlxQ= X-Google-Smtp-Source: APXvYqzmR5zHmVX7oYYmxa/tP13lgj6MS/5gMNwIMFishqYduHPDuqmInb3jORSGOl57ERy1FhVkZMU4O36NDEZdSco= X-Received: by 2002:a92:3984:: with SMTP id h4mr17889136ilf.36.1580197408893; Mon, 27 Jan 2020 23:43:28 -0800 (PST) MIME-Version: 1.0 References: <25e1916f1ffac83477ddf36358120ae8794e7711.camel@healis.eu> <0c915b4a9ffdd4797c90b510abbeb88a4a9c5858.camel@healis.eu> In-Reply-To: <0c915b4a9ffdd4797c90b510abbeb88a4a9c5858.camel@healis.eu> From: Joris Van den Bossche Date: Tue, 28 Jan 2020 08:43:17 +0100 Message-ID: Subject: Re: Indexing, encoding, transformations and processing with PyArrow - GitHub 6284 To: user@arrow.apache.org Content-Type: multipart/alternative; boundary="0000000000003b702d059d2e6154" --0000000000003b702d059d2e6154 Content-Type: text/plain; charset="UTF-8" On Tue, 28 Jan 2020 at 08:36, Athanassios I. Hatzis wrote: > > There was also the following question in my email that was not answered. > > > I also noticed that there is NumPy integration and you can convert > easily from NumPy to Arrow > > > but > > > the reverse direction has several limitations. For example I cannot > create view for StringArray > > > (NotImplementedError: NumPy array view is only supported for primitive > types). But string() > > > (utf8) > > > is in the list of your primitive types. Any plans for supporting this > type with NumPy soon ? > > Could you please suggest or point to a piece of code on how to convert > arrow.StringArray to numpy > for further processing ? Do I have to forget the view with the to_numpy() > method and make a copy in > order to process it, modify it in NumPy ? > > You can "convert" a pyarrow string array to a numpy array (eg with np.array(pyarrow_array), and which will give you an object dtype numpy array), but you cannot create a numpy view on the array. A pyarrow string array is a variable length string dtype, something that is not supported in numpy. So for this case, you will always need to make a copy (and convert to an object dtype). Joris --0000000000003b702d059d2e6154 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable


=
On Tue, 28 Jan 2020 at 08:36, Athanas= sios I. Hatzis <athanassios@hea= lis.eu> wrote:

There was also the following question in my email that was not answered. > > I also noticed that there is NumPy integration and you can conver= t easily from NumPy to Arrow
> > but
> > the reverse direction has several limitations. For example I cann= ot create view for StringArray
> > (NotImplementedError: NumPy array view is only supported for prim= itive types). But string()
> > (utf8)
> > is in the list of your primitive types. Any plans for supporting = this type with NumPy soon ?

Could you please suggest or point to a piece of code on how to convert arro= w.StringArray to numpy
for further processing ? Do I have to forget the view with the to_numpy() m= ethod and make a copy in
order to process it, modify it in NumPy ?

You can "convert" a pyarrow string array to= a numpy array (eg with np.array(pyarrow_array), and which will give you an= object dtype numpy array), but you cannot create a numpy view on the array= . A pyarrow string array is a variable length string dtype, something that = is not supported in numpy. So for this case, you will always need to make a= copy (and convert to an object dtype).

Joris
<= /div>

=C2=A0
--0000000000003b702d059d2e6154--