From user-return-147-archive-asf-public=cust-asf.ponee.io@arrow.apache.org Mon Jul 1 19:38:36 2019 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [207.244.88.153]) by mx-eu-01.ponee.io (Postfix) with SMTP id 62F02180763 for ; Mon, 1 Jul 2019 21:38:36 +0200 (CEST) Received: (qmail 6632 invoked by uid 500); 1 Jul 2019 19:38:16 -0000 Mailing-List: contact user-help@arrow.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@arrow.apache.org Delivered-To: mailing list user@arrow.apache.org Received: (qmail 6613 invoked by uid 99); 1 Jul 2019 19:38:16 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 01 Jul 2019 19:38:16 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 8A125C01DA; Mon, 1 Jul 2019 19:38:15 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -0.2 X-Spam-Level: X-Spam-Status: No, score=-0.2 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd1-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-ec2-va.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id eKzJLKHoN2yc; Mon, 1 Jul 2019 19:38:13 +0000 (UTC) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=209.85.166.50; helo=mail-io1-f50.google.com; envelope-from=wesmckinn@gmail.com; receiver= Received: from mail-io1-f50.google.com (mail-io1-f50.google.com [209.85.166.50]) by mx1-ec2-va.apache.org (ASF Mail Server at mx1-ec2-va.apache.org) with ESMTPS id EA425BC775; Mon, 1 Jul 2019 19:38:12 +0000 (UTC) Received: by mail-io1-f50.google.com with SMTP id k8so31572337iot.1; Mon, 01 Jul 2019 12:38:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to; bh=GW5c/RB/nK6ry/81ortUF+R9WQ1QGCitfZhwV+mdKkc=; b=mXrN6EaTkXWwBq653T7Uf1oqVKscDNIXflwb+SSs/tvoER1/jShtn2R6EaUXK8JSKv QDvNLUWKcm3RAacj1JxKG8LuL4M+SOwZbmZkEvSAGs/ylo0ljQcmVH0qvZzUiH3/ObNU iGD40ckgk9lzPXLdRgsXg/toGcoOPe3kqB9NIMKK2Levyk7ZdtMSbGern7plATrd/wSc DXBdto/QSjOchysuZY+eQDKSqL7mq96ioUvBNOprJWelQlOeUlysqNUfNZLtt2y0yDat l7Hd2Yr7fKSev/0e3LsOCJi8a72u8r+kwgczrmD0egY/4NE65pXm8CZmEzlm+gyv+9ZP V57g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to; bh=GW5c/RB/nK6ry/81ortUF+R9WQ1QGCitfZhwV+mdKkc=; b=JOwlDFF1l0lYNOBnB3VORjWF54e98XAz6kk2+E2OycDrlj7wouOzb9boizDglMIrjK 4/OJxmicbeGXejLQfQamftDDH/yD9314zUBKXhf90yESWhvT4H8PxiF4Kzzr7hmQf4FH 49SSmEBNeAd3U8lbZidVRhsPgcz351pPpqB/oN37g5OOqd3pf9grBI3RlySMGKJ6r+kH eBE1vt8+4A+wA8oqfu76lNJWIV35LAMdsiTxzHkarDzdkcwjw6gC53GbFtB+s+eG3Odg IxvqOM/ZcnIR9zdkZgn3qvg8g3g1+ZlP9jWqykSIgc5wdol0SzEc5iwlK+Y5yd1Ku5oL YaqA== X-Gm-Message-State: APjAAAWXxQdL84UtyRR8VMlT+S/h0USCHG/eiwJDGQrAIgMb/PTXVwwB plt2dvfnaP8t5XKZtaE1xaArn4Cd4JUC6CnFU6TpyBHa X-Google-Smtp-Source: APXvYqyrib19ZH1cPt2UveBjXupaX24v90TkN4IcKlvoH8l1obKxcartZMCyDNrcWMkhLWbaKSOWVM/IsHV5bDrLF7k= X-Received: by 2002:a02:8787:: with SMTP id t7mr30868461jai.29.1562009891589; Mon, 01 Jul 2019 12:38:11 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Wes McKinney Date: Mon, 1 Jul 2019 14:37:34 -0500 Message-ID: Subject: Re: RecordBatch with Tensors/Arrays To: user@arrow.apache.org, dev@arrow.apache.org Content-Type: text/plain; charset="UTF-8" hi Andrew, I'm copying dev@ just so more folks are in the loop On Wed, Jun 19, 2019 at 9:13 AM Andrew Spott wrote: > > I was told to post this here, rather than as an issue on Github. > > ==== > > I'm looking to serialize data that looks something like this: > > ``` > record = { "predicted": , > "truth": , > "loss": , > "index": } > > data = [ > pa.array([record, record, record]), > pa.array([, , ]) > pa.array([, , ]) > ] > > batch = pa.RecordBatch.from_arrays(data, ['f0', 'f1', 'f2']) > ``` > > But I'm not sure how to do that, or even if what I'm trying to do is the right way to do it. We don't support tensors/ndarrays as first-class value types in the Python or C++ libraries. This could be done hypothetically using the new ExtensionType facility. Tensor values would be embedded in an Arrow Binary column. There is already ARROW-1614 open for this. I also opened ARROW-5819 about implementing the Python-side plumbing around this Another possible option is to infer list<...> types from ndarrays (e.g. list> from an ndarray of ndim=2 and dtype=float64), but this has not been implemented. > > What is the difference between `pa.array` and `pa.list_`? This formulation is an array of structs, but is the struct of arrays formulation of this possible? i.e.: > * The return value of pa.array is an Array object, which wraps the C++ arrow::Array type, the base class for value sequences. It's data, not metadata * pa.list_ returns an instance of ListType, which is a DataType subclass. It's metadata, not data > ``` > data = [ > pa.array([ , , ]), > pa.array([ , , ]), > pa.array([, , ]), > ... > ] > ``` > > Which doesn't currently work. It seems that there is a separation between '1d arraylike' datatypes and 'pythonlike' datatypes (and 'nd arraylike' datatypes), so I can't have a struct of an array. > Right. ndarrays as array cell values are not natively part of the Arrow columnar format. But they could be supported through extensions. This would be a nice project for someone to take on in the future - Wes > -Andrew