From user-return-304-archive-asf-public=cust-asf.ponee.io@arrow.apache.org Wed Jan 29 16:57:20 2020 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [207.244.88.153]) by mx-eu-01.ponee.io (Postfix) with SMTP id 68838180643 for ; Wed, 29 Jan 2020 17:57:20 +0100 (CET) Received: (qmail 76060 invoked by uid 500); 29 Jan 2020 16:57:19 -0000 Mailing-List: contact user-help@arrow.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@arrow.apache.org Delivered-To: mailing list user@arrow.apache.org Received: (qmail 76043 invoked by uid 99); 29 Jan 2020 16:57:19 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 29 Jan 2020 16:57:19 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id 2C0D5C0311 for ; Wed, 29 Jan 2020 16:57:19 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 0.299 X-Spam-Level: X-Spam-Status: No, score=0.299 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, PDS_BTC_ID=0.499, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd4-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-ec2-va.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id rmBYLjKkdD5w for ; Wed, 29 Jan 2020 16:57:17 +0000 (UTC) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=209.85.166.172; helo=mail-il1-f172.google.com; envelope-from=wesmckinn@gmail.com; receiver= Received: from mail-il1-f172.google.com (mail-il1-f172.google.com [209.85.166.172]) by mx1-ec2-va.apache.org (ASF Mail Server at mx1-ec2-va.apache.org) with ESMTPS id 78DB1C162E for ; Wed, 29 Jan 2020 16:50:02 +0000 (UTC) Received: by mail-il1-f172.google.com with SMTP id s18so476691iln.0 for ; Wed, 29 Jan 2020 08:50:02 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :content-transfer-encoding; bh=I39SEl5vy948gvHzeeBsA98q+WUe7uTuH5MNI6EDgbU=; b=ZYSDx9TdLw06XatEctUYLhEnnYowKys0VOEudvCLhvxZzRsuod68pu5wQqELHIz7NL 2WYGyzw6IZ1nMf9N9it8+fW1ebrQIcFfXvh3mk/QDER3h8s0QYPlCuH1TVWIbCPaI0fz ewYT6B2XrPGJJbvZnsNCclZszfHCfG1M6o1SGli6RVxHJOmka9GsZsGU/FMf/uWEzMZQ upbRFA6TINkon8pqTjekVVbXa1YbwbklQtJmav0yQ1CxO6VjMl9CL2MFVoenifRKvNZN /h0Ldlyauq1tMwtVAotwesrpFuIJvdtYcSX0tcFMv93oemxWmCQ4jcoHqU1PWc5OKOZY XHNA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:content-transfer-encoding; bh=I39SEl5vy948gvHzeeBsA98q+WUe7uTuH5MNI6EDgbU=; b=qU6pm4sO7weBtlW1ZYwKrVi3SSv7Z6Nm9VQVfqXtyXeu/+FC0j1HsIfbj3EwVRRoee WSiO8G/cuzmRdHN4K4JbQ4VvpKiCvlkp1i+6ANfsqDXu/WtSyvB9a50hHF+v6qfZaGs4 6+LBMT9Zwx6ueEh89NistIIekXGy8XpVCHETeLf8SvbvhV6AVPHMQkdpGWoMEWz7KK0R yFq+3cI5sJwW1ZM13HzubVLCs+zqEvzjoAV84hJzsyPs2kub/s7toNfh4yclCztZZCOU KS5gwq92XnW+pWv9kFNYSlx0FPPkb6ptBd197hCRwbNMEBhD0X7xm6q9F9wJCXgr5G5r n2rQ== X-Gm-Message-State: APjAAAWmRVal2nLgCgHmujTFShuEDhExIQS4jE413e5q6IF1rJztVbGd +gTotfYrdTDStTZ0b8MGj2g6XDQXLsDNSPl/69RG+egx4J8= X-Google-Smtp-Source: APXvYqwSpb8RX6lk5GpBZtRb0/5TX9onQvgu0cDQju56uxcz9YD7nLjc7IF7TDzuIFyOLjTdddCtYS3gPYACdpMjb/4= X-Received: by 2002:a92:5c04:: with SMTP id q4mr188466ilb.44.1580316595888; Wed, 29 Jan 2020 08:49:55 -0800 (PST) MIME-Version: 1.0 References: <52ef95685f3848d2ab5452e372f11398@xbktrading.com> In-Reply-To: <52ef95685f3848d2ab5452e372f11398@xbktrading.com> From: Wes McKinney Date: Wed, 29 Jan 2020 10:49:19 -0600 Message-ID: Subject: Re: Converting clickhouse column to arrow array To: user@arrow.apache.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Wed, Jan 29, 2020 at 9:55 AM Calder, Matthew wr= ote: > > I managed to get conversion from CH to arrow using a CHToArrowType<> inte= r-type traits concept. However, I am still trying to crack the use of: > > arrow::VisitArrayInline Here's a minimal example of VisitArrayInline struct ArrayVisitor { Status Visit(const Array& arr) { return Status::OK(); } }; Status VisitArrayInlineExample(const Array& arr) { ArrayVisitor visitor; return VisitArrayInline(arr, &visitor); } You can add different Visit functions to match different specific Array subclasses or groups of types (e.g. integers, floating point, etc.). std::enable_if is helpful (and the various helper templates in arrow/type_traits.h) > > and > > arrow::ArrayDataVisitor Here's an example (didn't compile this, but hopefully this gives the idea) struct BooleanValueVisitor { int64_t num_true =3D 0; int64_t num_null =3D 0; Status VisitNull() { ++num_null; return Status::OK(); } Status VisitValue(bool value) { if (value) ++num_true; return Status::OK(); } }; Status VisitBooleanValues(const Array& arr) { BooleanValueVisitor visitor; return ArrayDataVisitor::Visit(*arr.data(), &visitor); } If you have a type-parameterized visitor, then you could have template Status VisitArrayValues(const Array& arr) { MyValueVisitor visitor; return ArrayDataVisitor::Visit(*arr.data(), &visitor); } (FWIW, we developed ArrayDataVisitor primarily for internal library use and not as a public API) I would personally try to first use VisitArrayInline if at all possible since it is simpler > > I have a struct: > > Struct AnArrayUser > { > template arrow::Status Visit(const T &a) > { > // How to invoke ArrayDataVisitor? > } > > void Use(const arrow::Array &a) {arrow::VisitArrayInline(a, this);} > > > arrow::Status VisitNull() {return arrow::Status::OK();} > template arrow::Status VisitValue(T val) {return arrow::St= atus::OK();} > }; > > Which appears to have it's "Use" method called appropriately. But inside = of the Visit method I have so far been unable to find the incantation to ma= ke a call through the ArrayDataVisitor. I've tried several variations of: > > arrow::ArrayDataVisitor::Visit(*(array.data()), th= is); > > at the // How to .. line above but can't seem to get it to work. I'm sure= I just have some fundamental misunderstanding of how this is supposed to w= ork. Can someone give me some guidance? > > Matt > > > > -----Original Message----- > From: Wes McKinney > Sent: Wednesday, January 22, 2020 12:03 PM > To: user@arrow.apache.org > Subject: Re: Converting clickhouse column to arrow array > > If you search for "VisitTypeInline" or "VisitArrayInline" in the C++ code= base you can find numerous examples of where this is used > > On Wed, Jan 22, 2020 at 10:58 AM Thomas Buhrmann wrote: > > > > Hi, > > I was looking for something similar, but didn't find a good example in = the docs or the source code showing how to use the visitor pattern. It woul= d be great, e.g., to have an example similar to the "Row to columnar conver= sion", showing a templated way to read arrow columns into C++ vectors using= the visitor pattern, and without implementing a separate reader function f= or each arrow type. Would that be possible? > > > > Many thanks, > > Thomas > > > > On Wed, 22 Jan 2020 at 17:13, Wes McKinney wrote: > >> > >> hi Matt, > >> > >> I recommend you use the visitor pattern combined with the > >> arrow::TypeTraits that we provide > >> > >> https://clicktime.symantec.com/38JEFUTGByJzrxbCs1aM2Mn7Vc?u=3Dhttps%3A= % > >> 2F%2Fgithub.com%2Fapache%2Farrow%2Fblob%2Fmaster%2Fcpp%2Fsrc%2Farrow% > >> 2Ftype_traits.h > >> > >> You'll need to provide a compile-time mapping from Clickhouse types > >> to Arrow types, but then you can statically access the correct > >> builder type at compile time > >> > >> using ArrowType =3D typename CHToArrowType::ArrowType; using > >> BuilderType =3D typename TypeTraits::BuilderType; > >> > >> ... > >> > >> or similar. In cases where the exported Clickhouse data does not have > >> an associated AppendValues method in Arrow you may have to write a > >> special case (please open JIRA issues if you think there should be > >> more AppendValues methods) > >> > >> Thanks > >> > >> On Wed, Jan 22, 2020 at 7:44 AM Calder, Matthew wrote: > >> > > >> > Hi, > >> > > >> > > >> > > >> > I am interfacing arrow to a Clickhouse database using their c++ clie= nt. Both arrow and CH have generic array-like classes with the element data= type internalized. Ideally, I would like to be able to write something lik= e: > >> > > >> > > >> > > >> > arrow::Array a =3D SomeConversionInvocation(clickhouse::Column c); > >> > > >> > > >> > > >> > Where the array and column have the same element type (int, double, = string, =E2=80=A6) but the code is generic to the specific type. > >> > > >> > > >> > > >> > I can do this by explicitly handling specific types through template= specialization but I thought that since arrow already has pretty generic t= ype handling through its templates, and clickhouse also has similar capabil= ity there ought to be a more seamless way to do the conversion. Zero copy w= ould probably be a lot to ask, but something short of template specializati= ons for every type is what I am aiming for. > >> > > >> > > >> > > >> > I currently do explicit type specialization. For example I have func= tions like: > >> > > >> > > >> > > >> > inline std::shared_ptr makeArray(const > >> > std::vector &v) > >> > > >> > { > >> > > >> > arrow::DoubleBuilder builder; > >> > > >> > builder.AppendValues(v); > >> > > >> > std::shared_ptr array; > >> > > >> > builder.Finish(&array); > >> > > >> > return array; > >> > > >> > } > >> > > >> > > >> > > >> > inline std::shared_ptr makeArray(const > >> > std::vector &v) > >> > > >> > { > >> > > >> > arrow::Int32Builder builder; > >> > > >> > builder.AppendValues(v); > >> > > >> > std::shared_ptr array; > >> > > >> > builder.Finish(&array); > >> > > >> > return array; > >> > > >> > } > >> > > >> > > >> > > >> > Which I suspect is unnecessarily explicit. Is there a more generic w= ay of handling the variety of underlying array element data types when cons= tructing arrow::Array objects? And can someone point me to examples that in= terface arrow to another similarly generically typed library (doesn=E2=80= =99t have to be clickhouse). Thanks for any guidance. > >> > > >> > > >> > > >> > Matt > >> > > >> > > >> > > >> > > >> > The information contained in this e-mail may be confidential and is = intended solely for the use of the named addressee. > >> > > >> > Access, copying or re-use of the e-mail or any information contained= therein by any other person is not authorized. > >> > > >> > If you are not the intended recipient please notify us immediately b= y returning the e-mail to the originator. > >> > > >> > Disclaimer Version MB.US.1 > > The information contained in this e-mail may be confidential and is inten= ded solely for the use of the named addressee. > > Access, copying or re-use of the e-mail or any information contained ther= ein by any other person is not authorized. > > If you are not the intended recipient please notify us immediately by ret= urning the e-mail to the originator. > > Disclaimer Version MB.US.1