From user-return-284-archive-asf-public=cust-asf.ponee.io@arrow.apache.org Wed Jan 22 16:58:22 2020 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [207.244.88.153]) by mx-eu-01.ponee.io (Postfix) with SMTP id 719D118064C for ; Wed, 22 Jan 2020 17:58:22 +0100 (CET) Received: (qmail 71658 invoked by uid 500); 22 Jan 2020 16:58:21 -0000 Mailing-List: contact user-help@arrow.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@arrow.apache.org Delivered-To: mailing list user@arrow.apache.org Received: (qmail 71648 invoked by uid 99); 22 Jan 2020 16:58:21 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 22 Jan 2020 16:58:21 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 3F32FC0AD5 for ; Wed, 22 Jan 2020 16:58:21 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 0.001 X-Spam-Level: X-Spam-Status: No, score=0.001 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, HTML_MESSAGE=0.2, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd1-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-he-de.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id pjTpDAhvEH_i for ; Wed, 22 Jan 2020 16:58:19 +0000 (UTC) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=2607:f8b0:4864:20::335; helo=mail-ot1-x335.google.com; envelope-from=thomas.buehrmann@gmail.com; receiver= Received: from mail-ot1-x335.google.com (mail-ot1-x335.google.com [IPv6:2607:f8b0:4864:20::335]) by mx1-he-de.apache.org (ASF Mail Server at mx1-he-de.apache.org) with ESMTPS id 629EA7E139 for ; Wed, 22 Jan 2020 16:58:18 +0000 (UTC) Received: by mail-ot1-x335.google.com with SMTP id b18so6936883otp.0 for ; Wed, 22 Jan 2020 08:58:18 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to; bh=QQpHE1UsEGYnDny613REeK0b1UxMmJNnLwrlnAcNjQ0=; b=nFGFnc5FvEdMqtQXO4+Q5fbk5b/Hkb9va2XEKkjWkfDjAR8pTX1iduEUJ81gJ9MTch r2m4tCzWiewCS8d0qzO5kPdB1X88Fv3Cz8KcGS8MhQb9wFEe2fLPSLeimA0QkLMcn/qt KfNFCqvDH2XWigYzf3MFpekPQf3eiAHoxjENNTdO8uCiX/dc3di84KxUfx28H+Qrc4D4 /+HkBsYE5WJMzS1WHrQ1nee5tmY8KjzPKqx3GGTd4fb5JJ9ag1uxNDgh6b/6AuoOnNRM 1Ig68zRGDSsYs1zrVd59Zp6ss1mAZD2HGSMBEeISX2Jl3WnehB/AkiOknw9Po1afviXl jF9Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to; bh=QQpHE1UsEGYnDny613REeK0b1UxMmJNnLwrlnAcNjQ0=; b=H0AE5fPLMVs0qhHfCkfKinF/MLbTeO5tUq0VlaJAmYdA1iF+HtOLKY/GmLgnnmmkeJ GUXG8xIl2iY8f+1TBgZCe5UV1pdquZT6OHQX4B0edC3pb6+TJq1KxZMxXJv2vtiqqbnL e8I4CEKN033BritsRde/70lP3vvKBKBjMyPIUFsZD1InoCFDF4b96U/SDsC5y/MYLOhH jmjW8V3S3lFhmpF4vt6DDQnhVKgxCONvK/0DH+RP2n2eUfpDBLwlUxje5gXKTSBSa3V6 sp4JettgcwDzEq/hUMoSi0X3C2R0vVmgb01Lh5Iv/5UHUPZSyCjaTNL5mZz/r5PXVSh5 PctQ== X-Gm-Message-State: APjAAAUgDwzszmyVnhJD9UTMF91+by/nqZQstQWkqsVe1hglZr6KofpB dBk6JEAHF+cA5ijl9TkSHO97j07FKVC2pli4+0y/sCUwzcE= X-Google-Smtp-Source: APXvYqwXMcIQPH/Wp0+Tj6qPCxeX5slRe3N5C3Wxhvn1exfplW33Wt0hoU3+S1aL+8q7RmTp47ktkXBwvOVxAaRco+w= X-Received: by 2002:a9d:6c52:: with SMTP id g18mr8119254otq.356.1579712296915; Wed, 22 Jan 2020 08:58:16 -0800 (PST) MIME-Version: 1.0 References: In-Reply-To: From: Thomas Buhrmann Date: Wed, 22 Jan 2020 17:58:05 +0100 Message-ID: Subject: Re: Converting clickhouse column to arrow array To: user@arrow.apache.org Content-Type: multipart/alternative; boundary="0000000000004e1bdf059cbd6ef2" --0000000000004e1bdf059cbd6ef2 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Hi, I was looking for something similar, but didn't find a good example in the docs or the source code showing how to use the visitor pattern. It would be great, e.g., to have an example similar to the "Row to columnar conversion", showing a templated way to read arrow columns into C++ vectors using the visitor pattern, and without implementing a separate reader function for each arrow type. Would that be possible? Many thanks, Thomas On Wed, 22 Jan 2020 at 17:13, Wes McKinney wrote: > hi Matt, > > I recommend you use the visitor pattern combined with the > arrow::TypeTraits that we provide > > https://github.com/apache/arrow/blob/master/cpp/src/arrow/type_traits.h > > You'll need to provide a compile-time mapping from Clickhouse types to > Arrow types, but then you can statically access the correct builder > type at compile time > > using ArrowType =3D typename CHToArrowType::ArrowType; > using BuilderType =3D typename TypeTraits::BuilderType; > > ... > > or similar. In cases where the exported Clickhouse data does not have > an associated AppendValues method in Arrow you may have to write a > special case (please open JIRA issues if you think there should be > more AppendValues methods) > > Thanks > > On Wed, Jan 22, 2020 at 7:44 AM Calder, Matthew > wrote: > > > > Hi, > > > > > > > > I am interfacing arrow to a Clickhouse database using their c++ client. > Both arrow and CH have generic array-like classes with the element data > type internalized. Ideally, I would like to be able to write something li= ke: > > > > > > > > arrow::Array a =3D SomeConversionInvocation(clickhouse::Column c); > > > > > > > > Where the array and column have the same element type (int, double, > string, =E2=80=A6) but the code is generic to the specific type. > > > > > > > > I can do this by explicitly handling specific types through template > specialization but I thought that since arrow already has pretty generic > type handling through its templates, and clickhouse also has similar > capability there ought to be a more seamless way to do the conversion. Ze= ro > copy would probably be a lot to ask, but something short of template > specializations for every type is what I am aiming for. > > > > > > > > I currently do explicit type specialization. For example I have > functions like: > > > > > > > > inline std::shared_ptr makeArray(const std::vector > &v) > > > > { > > > > arrow::DoubleBuilder builder; > > > > builder.AppendValues(v); > > > > std::shared_ptr array; > > > > builder.Finish(&array); > > > > return array; > > > > } > > > > > > > > inline std::shared_ptr makeArray(const std::vector &= v) > > > > { > > > > arrow::Int32Builder builder; > > > > builder.AppendValues(v); > > > > std::shared_ptr array; > > > > builder.Finish(&array); > > > > return array; > > > > } > > > > > > > > Which I suspect is unnecessarily explicit. Is there a more generic way > of handling the variety of underlying array element data types when > constructing arrow::Array objects? And can someone point me to examples > that interface arrow to another similarly generically typed library > (doesn=E2=80=99t have to be clickhouse). Thanks for any guidance. > > > > > > > > Matt > > > > > > > > > > The information contained in this e-mail may be confidential and is > intended solely for the use of the named addressee. > > > > Access, copying or re-use of the e-mail or any information contained > therein by any other person is not authorized. > > > > If you are not the intended recipient please notify us immediately by > returning the e-mail to the originator. > > > > Disclaimer Version MB.US.1 > --0000000000004e1bdf059cbd6ef2 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Hi,
I was looking for something=C2=A0similar, but = didn't find a good example in the docs or the source code showing how t= o use the visitor pattern. It would be great, e.g., to have an example simi= lar to the "Row to columnar conversion", showing a templated way = to read arrow columns into C++ vectors using the visitor pattern, and witho= ut implementing a separate reader function for each arrow type. Would that = be possible?

Many thanks,
Thomas

On= Wed, 22 Jan 2020 at 17:13, Wes McKinney <wesmckinn@gmail.com> wrote:
hi Matt,

I recommend you use the visitor pattern combined with the
arrow::TypeTraits that we provide

https://github.com/apache/arr= ow/blob/master/cpp/src/arrow/type_traits.h

You'll need to provide a compile-time mapping from Clickhouse types to<= br> Arrow types, but then you can statically access the correct builder
type at compile time

using ArrowType =3D typename CHToArrowType<CHType>::ArrowType;
using BuilderType =3D typename TypeTraits<ArrowType>::BuilderType;
...

or similar. In cases where the exported Clickhouse data does not have
an associated AppendValues method in Arrow you may have to write a
special case (please open JIRA issues if you think there should be
more AppendValues methods)

Thanks

On Wed, Jan 22, 2020 at 7:44 AM Calder, Matthew <mcalder@xbktrading.com> wrote:<= br> >
> Hi,
>
>
>
> I am interfacing arrow to a Clickhouse database using their c++ client= . Both arrow and CH have generic array-like classes with the element data t= ype internalized. Ideally, I would like to be able to write something like:=
>
>
>
> arrow::Array a =3D SomeConversionInvocation(clickhouse::Column c);
>
>
>
> Where the array and column have the same element type (int, double, st= ring, =E2=80=A6) but the code is generic to the specific type.
>
>
>
> I can do this by explicitly handling specific types through template s= pecialization but I thought that since arrow already has pretty generic typ= e handling through its templates, and clickhouse also has similar capabilit= y there ought to be a more seamless way to do the conversion. Zero copy wou= ld probably be a lot to ask, but something short of template specialization= s for every type is what I am aiming for.
>
>
>
> I currently do explicit type specialization. For example I have functi= ons like:
>
>
>
> inline std::shared_ptr<arrow::Array> makeArray(const std::vector= <double> &v)
>
> {
>
>=C2=A0 =C2=A0 =C2=A0arrow::DoubleBuilder builder;
>
>=C2=A0 =C2=A0 =C2=A0builder.AppendValues(v);
>
>=C2=A0 =C2=A0 =C2=A0std::shared_ptr<arrow::Array> array;
>
>=C2=A0 =C2=A0 =C2=A0builder.Finish(&array);
>
>=C2=A0 =C2=A0 =C2=A0return array;
>
> }
>
>
>
> inline std::shared_ptr<arrow::Array> makeArray(const std::vector= <int> &v)
>
> {
>
>=C2=A0 =C2=A0 =C2=A0arrow::Int32Builder builder;
>
>=C2=A0 =C2=A0 =C2=A0builder.AppendValues(v);
>
>=C2=A0 =C2=A0 =C2=A0std::shared_ptr<arrow::Array> array;
>
>=C2=A0 =C2=A0 =C2=A0builder.Finish(&array);
>
>=C2=A0 =C2=A0 =C2=A0return array;
>
> }
>
>
>
> Which I suspect is unnecessarily explicit. Is there a more generic way= of handling the variety of underlying array element data types when constr= ucting arrow::Array objects? And can someone point me to examples that inte= rface arrow to another similarly generically typed library (doesn=E2=80=99t= have to be clickhouse). Thanks for any guidance.
>
>
>
> Matt
>
>
>
>
> The information contained in this e-mail may be confidential and is in= tended solely for the use of the named addressee.
>
> Access, copying or re-use of the e-mail or any information contained t= herein by any other person is not authorized.
>
> If you are not the intended recipient please notify us immediately by = returning the e-mail to the originator.
>
> Disclaimer Version MB.US.1
--0000000000004e1bdf059cbd6ef2--