From user-return-305-archive-asf-public=cust-asf.ponee.io@arrow.apache.org Thu Jan 30 21:43:44 2020 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [207.244.88.153]) by mx-eu-01.ponee.io (Postfix) with SMTP id 43C3F18062B for ; Thu, 30 Jan 2020 22:43:44 +0100 (CET) Received: (qmail 97474 invoked by uid 500); 30 Jan 2020 21:43:43 -0000 Mailing-List: contact user-help@arrow.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@arrow.apache.org Delivered-To: mailing list user@arrow.apache.org Received: (qmail 97454 invoked by uid 99); 30 Jan 2020 21:43:43 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 30 Jan 2020 21:43:43 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id AAB0B180F3F for ; Thu, 30 Jan 2020 21:43:42 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.499 X-Spam-Level: * X-Spam-Status: No, score=1.499 tagged_above=-999 required=6.31 tests=[BITCOIN_SPAM_04=1, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, HTML_MESSAGE=0.2, PDS_BTC_ID=0.499, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd3-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-ec2-va.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id bvfj2C3JkPwO for ; Thu, 30 Jan 2020 21:43:40 +0000 (UTC) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=209.85.217.45; helo=mail-vs1-f45.google.com; envelope-from=emkornfield@gmail.com; receiver= Received: from mail-vs1-f45.google.com (mail-vs1-f45.google.com [209.85.217.45]) by mx1-ec2-va.apache.org (ASF Mail Server at mx1-ec2-va.apache.org) with ESMTPS id A924CBB802 for ; Thu, 30 Jan 2020 21:43:40 +0000 (UTC) Received: by mail-vs1-f45.google.com with SMTP id v141so3053766vsv.12 for ; Thu, 30 Jan 2020 13:43:40 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:reply-to:from:date:message-id :subject:to; bh=OrvqmhhS74kkQBTM25amyO1AivOVyw74kHHCTVE6u4k=; b=KLKJue72cJ3riSYJDldaAF/RBCevagkDUiH50qIRd+oNr5+yMo4YfnwwfzfgeIserM 5BKRsZcxMqeFc1mz3jlAg/jvIF/delW3YzKdHpK1z7dIHs2YMiSLHiMRNdbPwRsw5cKs CyYG5psIYipd+JF7QrubYuN7yqudp9vkr4TEgZGAQ9zP/pGUaJeAACyY6HRBp48zk8eR Db+GhGEkdyX323eCJ89ylWv89Q7Wa1R2ZnpPohxMebAgvgi3vjEbLXyVa0I89pg3vL+S ehLbQD+87fUe8kveDgF+KfsHl9evb0oMTcQUvu6U2uk9gFaExFq/lhJuQuiq542N9BBU xz1g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:reply-to :from:date:message-id:subject:to; bh=OrvqmhhS74kkQBTM25amyO1AivOVyw74kHHCTVE6u4k=; b=UYoi3J6C7jMXyn4liQk+oyBRnh3TN3ClPnpJBGC3AALyVhoFJCqOsD+uUiEFDS9Rhm qeqk/29PaCVX4TxukH/HH3TP/r2+CA365Lmi1gqjAWbONMX28ABSWpW2aG+3UwN1oe+q nIeAxM8tc2u8r04d/vHxpHmX1L6hMXvvjHuHLODy0n7cjC/ox167zJDq0LtKob/0l0hm z8G8Uw5Bj+GzE8QUI4H7Y7b921WMoDLaQgriQdzhKKo7FlcJB8FY+Ga/sXr+xOl1iDEa fapqmqfbUC8QEGnUOFs+0b0rW1GSHNaComIC87FNqDo9f/qU0wMORFH5ARK8Ju0Od9zH mnmw== X-Gm-Message-State: APjAAAXdjVz6EBCv/VmkseRPQf8ED4cgEelI7Kp4nvr2fKwP2HoIs5gd qHlqiBOLnegLGsYK6YAMJgFDg3Fv6jahrm1dmBo8II4R+AQ= X-Google-Smtp-Source: APXvYqyZMRNPHJbKkpdkhD9PPe3qfLKHcCQlVvz8+eT+b4LaGoQC3BJ+NSoXYIm48ORYzdU1lEfH96VUcVDGemjWIFg= X-Received: by 2002:a05:6102:448:: with SMTP id e8mr4946949vsq.116.1580420620047; Thu, 30 Jan 2020 13:43:40 -0800 (PST) MIME-Version: 1.0 References: <52ef95685f3848d2ab5452e372f11398@xbktrading.com> In-Reply-To: Reply-To: emkornfield@gmail.com From: Micah Kornfield Date: Thu, 30 Jan 2020 13:43:29 -0800 Message-ID: Subject: Re: Converting clickhouse column to arrow array To: user@arrow.apache.org Content-Type: multipart/alternative; boundary="000000000000a75ad6059d625938" --000000000000a75ad6059d625938 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable > > (FWIW, we developed ArrayDataVisitor primarily for internal library > use and not as a public API) > I would personally try to first use VisitArrayInline if at all > possible since it is simpler Is VisitArrayInline meant to be for public use? visitor_inline.h still has the disclaimer "Private header, not to be exported". Thanks, Micah On Wed, Jan 29, 2020 at 8:57 AM Wes McKinney wrote: > On Wed, Jan 29, 2020 at 9:55 AM Calder, Matthew > wrote: > > > > I managed to get conversion from CH to arrow using a CHToArrowType<> > inter-type traits concept. However, I am still trying to crack the use of= : > > > > arrow::VisitArrayInline > > Here's a minimal example of VisitArrayInline > > struct ArrayVisitor { > Status Visit(const Array& arr) { > return Status::OK(); > } > }; > > Status VisitArrayInlineExample(const Array& arr) { > ArrayVisitor visitor; > return VisitArrayInline(arr, &visitor); > } > > You can add different Visit functions to match different specific > Array subclasses or groups of types (e.g. integers, floating point, > etc.). std::enable_if is helpful (and the various helper templates in > arrow/type_traits.h) > > > > > and > > > > arrow::ArrayDataVisitor > > Here's an example (didn't compile this, but hopefully this gives the idea= ) > > struct BooleanValueVisitor { > int64_t num_true =3D 0; > int64_t num_null =3D 0; > > Status VisitNull() { > ++num_null; > return Status::OK(); > } > > Status VisitValue(bool value) { > if (value) ++num_true; > return Status::OK(); > } > }; > > > Status VisitBooleanValues(const Array& arr) { > BooleanValueVisitor visitor; > return ArrayDataVisitor::Visit(*arr.data(), &visitor); > } > > If you have a type-parameterized visitor, then you could have > > template > Status VisitArrayValues(const Array& arr) { > MyValueVisitor visitor; > return ArrayDataVisitor::Visit(*arr.data(), &visitor); > } > > (FWIW, we developed ArrayDataVisitor primarily for internal library > use and not as a public API) > > I would personally try to first use VisitArrayInline if at all > possible since it is simpler > > > > > I have a struct: > > > > Struct AnArrayUser > > { > > template arrow::Status Visit(const T &a) > > { > > // How to invoke ArrayDataVisitor? > > } > > > > void Use(const arrow::Array &a) {arrow::VisitArrayInline(a, this);= } > > > > > > arrow::Status VisitNull() {return arrow::Status::OK();} > > template arrow::Status VisitValue(T val) {return > arrow::Status::OK();} > > }; > > > > Which appears to have it's "Use" method called appropriately. But insid= e > of the Visit method I have so far been unable to find the incantation to > make a call through the ArrayDataVisitor. I've tried several variations o= f: > > > > arrow::ArrayDataVisitor::Visit(*(array.data()), > this); > > > > at the // How to .. line above but can't seem to get it to work. I'm > sure I just have some fundamental misunderstanding of how this is suppose= d > to work. Can someone give me some guidance? > > > > Matt > > > > > > > > -----Original Message----- > > From: Wes McKinney > > Sent: Wednesday, January 22, 2020 12:03 PM > > To: user@arrow.apache.org > > Subject: Re: Converting clickhouse column to arrow array > > > > If you search for "VisitTypeInline" or "VisitArrayInline" in the C++ > codebase you can find numerous examples of where this is used > > > > On Wed, Jan 22, 2020 at 10:58 AM Thomas Buhrmann < > thomas.buehrmann@gmail.com> wrote: > > > > > > Hi, > > > I was looking for something similar, but didn't find a good example i= n > the docs or the source code showing how to use the visitor pattern. It > would be great, e.g., to have an example similar to the "Row to columnar > conversion", showing a templated way to read arrow columns into C++ vecto= rs > using the visitor pattern, and without implementing a separate reader > function for each arrow type. Would that be possible? > > > > > > Many thanks, > > > Thomas > > > > > > On Wed, 22 Jan 2020 at 17:13, Wes McKinney > wrote: > > >> > > >> hi Matt, > > >> > > >> I recommend you use the visitor pattern combined with the > > >> arrow::TypeTraits that we provide > > >> > > >> https://clicktime.symantec.com/38JEFUTGByJzrxbCs1aM2Mn7Vc?u=3Dhttps%= 3A% > > >> 2F%2Fgithub.com%2Fapache%2Farrow%2Fblob%2Fmaster%2Fcpp%2Fsrc%2Farrow= % > > >> 2Ftype_traits.h > > >> > > >> You'll need to provide a compile-time mapping from Clickhouse types > > >> to Arrow types, but then you can statically access the correct > > >> builder type at compile time > > >> > > >> using ArrowType =3D typename CHToArrowType::ArrowType; using > > >> BuilderType =3D typename TypeTraits::BuilderType; > > >> > > >> ... > > >> > > >> or similar. In cases where the exported Clickhouse data does not hav= e > > >> an associated AppendValues method in Arrow you may have to write a > > >> special case (please open JIRA issues if you think there should be > > >> more AppendValues methods) > > >> > > >> Thanks > > >> > > >> On Wed, Jan 22, 2020 at 7:44 AM Calder, Matthew < > mcalder@xbktrading.com> wrote: > > >> > > > >> > Hi, > > >> > > > >> > > > >> > > > >> > I am interfacing arrow to a Clickhouse database using their c++ > client. Both arrow and CH have generic array-like classes with the elemen= t > data type internalized. Ideally, I would like to be able to write somethi= ng > like: > > >> > > > >> > > > >> > > > >> > arrow::Array a =3D SomeConversionInvocation(clickhouse::Column c); > > >> > > > >> > > > >> > > > >> > Where the array and column have the same element type (int, double= , > string, =E2=80=A6) but the code is generic to the specific type. > > >> > > > >> > > > >> > > > >> > I can do this by explicitly handling specific types through > template specialization but I thought that since arrow already has pretty > generic type handling through its templates, and clickhouse also has > similar capability there ought to be a more seamless way to do the > conversion. Zero copy would probably be a lot to ask, but something short > of template specializations for every type is what I am aiming for. > > >> > > > >> > > > >> > > > >> > I currently do explicit type specialization. For example I have > functions like: > > >> > > > >> > > > >> > > > >> > inline std::shared_ptr makeArray(const > > >> > std::vector &v) > > >> > > > >> > { > > >> > > > >> > arrow::DoubleBuilder builder; > > >> > > > >> > builder.AppendValues(v); > > >> > > > >> > std::shared_ptr array; > > >> > > > >> > builder.Finish(&array); > > >> > > > >> > return array; > > >> > > > >> > } > > >> > > > >> > > > >> > > > >> > inline std::shared_ptr makeArray(const > > >> > std::vector &v) > > >> > > > >> > { > > >> > > > >> > arrow::Int32Builder builder; > > >> > > > >> > builder.AppendValues(v); > > >> > > > >> > std::shared_ptr array; > > >> > > > >> > builder.Finish(&array); > > >> > > > >> > return array; > > >> > > > >> > } > > >> > > > >> > > > >> > > > >> > Which I suspect is unnecessarily explicit. Is there a more generic > way of handling the variety of underlying array element data types when > constructing arrow::Array objects? And can someone point me to examples > that interface arrow to another similarly generically typed library > (doesn=E2=80=99t have to be clickhouse). Thanks for any guidance. > > >> > > > >> > > > >> > > > >> > Matt > > >> > > > >> > > > >> > > > >> > > > >> > The information contained in this e-mail may be confidential and i= s > intended solely for the use of the named addressee. > > >> > > > >> > Access, copying or re-use of the e-mail or any information > contained therein by any other person is not authorized. > > >> > > > >> > If you are not the intended recipient please notify us immediately > by returning the e-mail to the originator. > > >> > > > >> > Disclaimer Version MB.US.1 > > > > The information contained in this e-mail may be confidential and is > intended solely for the use of the named addressee. > > > > Access, copying or re-use of the e-mail or any information contained > therein by any other person is not authorized. > > > > If you are not the intended recipient please notify us immediately by > returning the e-mail to the originator. > > > > Disclaimer Version MB.US.1 > --000000000000a75ad6059d625938 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
(FWIW, w= e developed ArrayDataVisitor primarily for internal library
use and not = as a public API)
I would personally try to first use VisitArrayInline if= at all
possible since it is simpler

Is = VisitArrayInline meant to be for public use?=C2=A0 visitor_inline.h still h= as the disclaimer "Private header, not to be exported".

Thanks,
Micah

On Wed, Jan 29, 2020 at 8:57 A= M Wes McKinney <wesmckinn@gmail.c= om> wrote:
mcalder@xbktrading.com> wrote= :
>
> I managed to get conversion from CH to arrow using a CHToArrowType<= > inter-type traits concept. However, I am still trying to crack the use= of:
>
>=C2=A0 arrow::VisitArrayInline

Here's a minimal example of VisitArrayInline

struct ArrayVisitor {
=C2=A0 Status Visit(const Array& arr) {
=C2=A0 =C2=A0 return Status::OK();
=C2=A0 }
};

Status VisitArrayInlineExample(const Array& arr) {
=C2=A0 ArrayVisitor visitor;
=C2=A0 return VisitArrayInline(arr, &visitor);
}

You can add different Visit functions to match different specific
Array subclasses or groups of types (e.g. integers, floating point,
etc.). std::enable_if is helpful (and the various helper templates in
arrow/type_traits.h)

>
> and
>
> arrow::ArrayDataVisitor

Here's an example (didn't compile this, but hopefully this gives th= e idea)

struct BooleanValueVisitor {
=C2=A0 int64_t num_true =3D 0;
=C2=A0 int64_t num_null =3D 0;

=C2=A0 Status VisitNull() {
=C2=A0 =C2=A0 ++num_null;
=C2=A0 =C2=A0 return Status::OK();
=C2=A0 }

=C2=A0 Status VisitValue(bool value) {
=C2=A0 =C2=A0 if (value) ++num_true;
=C2=A0 =C2=A0 return Status::OK();
=C2=A0 }
};


Status VisitBooleanValues(const Array& arr) {
=C2=A0 BooleanValueVisitor visitor;
=C2=A0 return ArrayDataVisitor<BooleanType>::Visit(*arr.data(), &= visitor);
}

If you have a type-parameterized visitor, then you could have

template <typename ArrowType>
Status VisitArrayValues(const Array& arr) {
=C2=A0 MyValueVisitor<ArrowType> visitor;
=C2=A0 return ArrayDataVisitor<ArrowType>::Visit(*arr.data(), &vi= sitor);
}

(FWIW, we developed ArrayDataVisitor primarily for internal library
use and not as a public API)

I would personally try to first use VisitArrayInline if at all
possible since it is simpler

>
> I have a struct:
>
> Struct AnArrayUser
> {
>=C2=A0 =C2=A0 =C2=A0 template <typename T> arrow::Status Visit(co= nst T &a)
>=C2=A0 =C2=A0 =C2=A0 {
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 // How to invoke ArrayDataVis= itor?
>=C2=A0 =C2=A0 =C2=A0 }
>
>=C2=A0 =C2=A0 =C2=A0 void Use(const arrow::Array &a) {arrow::VisitA= rrayInline(a, this);}
>
>
>=C2=A0 =C2=A0 =C2=A0 arrow::Status VisitNull() {return arrow::Status::O= K();}
>=C2=A0 =C2=A0 =C2=A0 template <class T> arrow::Status VisitValue(= T val) {return arrow::Status::OK();}
> };
>
> Which appears to have it's "Use" method called appropria= tely. But inside of the Visit method I have so far been unable to find the = incantation to make a call through the ArrayDataVisitor. I've tried sev= eral variations of:
>
> arrow::ArrayDataVisitor<typename T::TypeClass>::Visit(*(array.da= ta()), this);
>
> at the // How to .. line above but can't seem to get it to work. I= 'm sure I just have some fundamental misunderstanding of how this is su= pposed to work. Can someone give me some guidance?
>
> Matt
>
>
>
> -----Original Message-----
> From: Wes McKinney <wesmckinn@gmail.com>
> Sent: Wednesday, January 22, 2020 12:03 PM
> To: user@ar= row.apache.org
> Subject: Re: Converting clickhouse column to arrow array
>
> If you search for "VisitTypeInline" or "VisitArrayInlin= e" in the C++ codebase you can find numerous examples of where this is= used
>
> On Wed, Jan 22, 2020 at 10:58 AM Thomas Buhrmann <thomas.buehrmann@gmail.com> wrote:
> >
> > Hi,
> > I was looking for something similar, but didn't find a good e= xample in the docs or the source code showing how to use the visitor patter= n. It would be great, e.g., to have an example similar to the "Row to = columnar conversion", showing a templated way to read arrow columns in= to C++ vectors using the visitor pattern, and without implementing a separa= te reader function for each arrow type. Would that be possible?
> >
> > Many thanks,
> > Thomas
> >
> > On Wed, 22 Jan 2020 at 17:13, Wes McKinney <
wesmckinn@gmail.com> wrote: > >>
> >> hi Matt,
> >>
> >> I recommend you use the visitor pattern combined with the
> >> arrow::TypeTraits that we provide
> >>
> >> https://clickti= me.symantec.com/38JEFUTGByJzrxbCs1aM2Mn7Vc?u=3Dhttps%3A%
> >> 2F%2Fgithub.com%2Fapache%2Farrow%2Fblob%2Fmaster%2Fcpp%2Fsrc%= 2Farrow%
> >> 2Ftype_traits.h
> >>
> >> You'll need to provide a compile-time mapping from Clickh= ouse types
> >> to Arrow types, but then you can statically access the correc= t
> >> builder type at compile time
> >>
> >> using ArrowType =3D typename CHToArrowType<CHType>::Arr= owType; using
> >> BuilderType =3D typename TypeTraits<ArrowType>::Builder= Type;
> >>
> >> ...
> >>
> >> or similar. In cases where the exported Clickhouse data does = not have
> >> an associated AppendValues method in Arrow you may have to wr= ite a
> >> special case (please open JIRA issues if you think there shou= ld be
> >> more AppendValues methods)
> >>
> >> Thanks
> >>
> >> On Wed, Jan 22, 2020 at 7:44 AM Calder, Matthew <mcalder@xbktrading.com= > wrote:
> >> >
> >> > Hi,
> >> >
> >> >
> >> >
> >> > I am interfacing arrow to a Clickhouse database using th= eir c++ client. Both arrow and CH have generic array-like classes with the = element data type internalized. Ideally, I would like to be able to write s= omething like:
> >> >
> >> >
> >> >
> >> > arrow::Array a =3D SomeConversionInvocation(clickhouse::= Column c);
> >> >
> >> >
> >> >
> >> > Where the array and column have the same element type (i= nt, double, string, =E2=80=A6) but the code is generic to the specific type= .
> >> >
> >> >
> >> >
> >> > I can do this by explicitly handling specific types thro= ugh template specialization but I thought that since arrow already has pret= ty generic type handling through its templates, and clickhouse also has sim= ilar capability there ought to be a more seamless way to do the conversion.= Zero copy would probably be a lot to ask, but something short of template = specializations for every type is what I am aiming for.
> >> >
> >> >
> >> >
> >> > I currently do explicit type specialization. For example= I have functions like:
> >> >
> >> >
> >> >
> >> > inline std::shared_ptr<arrow::Array> makeArray(con= st
> >> > std::vector<double> &v)
> >> >
> >> > {
> >> >
> >> >=C2=A0 =C2=A0 =C2=A0arrow::DoubleBuilder builder;
> >> >
> >> >=C2=A0 =C2=A0 =C2=A0builder.AppendValues(v);
> >> >
> >> >=C2=A0 =C2=A0 =C2=A0std::shared_ptr<arrow::Array> a= rray;
> >> >
> >> >=C2=A0 =C2=A0 =C2=A0builder.Finish(&array);
> >> >
> >> >=C2=A0 =C2=A0 =C2=A0return array;
> >> >
> >> > }
> >> >
> >> >
> >> >
> >> > inline std::shared_ptr<arrow::Array> makeArray(con= st
> >> > std::vector<int> &v)
> >> >
> >> > {
> >> >
> >> >=C2=A0 =C2=A0 =C2=A0arrow::Int32Builder builder;
> >> >
> >> >=C2=A0 =C2=A0 =C2=A0builder.AppendValues(v);
> >> >
> >> >=C2=A0 =C2=A0 =C2=A0std::shared_ptr<arrow::Array> a= rray;
> >> >
> >> >=C2=A0 =C2=A0 =C2=A0builder.Finish(&array);
> >> >
> >> >=C2=A0 =C2=A0 =C2=A0return array;
> >> >
> >> > }
> >> >
> >> >
> >> >
> >> > Which I suspect is unnecessarily explicit. Is there a mo= re generic way of handling the variety of underlying array element data typ= es when constructing arrow::Array objects? And can someone point me to exam= ples that interface arrow to another similarly generically typed library (d= oesn=E2=80=99t have to be clickhouse). Thanks for any guidance.
> >> >
> >> >
> >> >
> >> > Matt
> >> >
> >> >
> >> >
> >> >
> >> > The information contained in this e-mail may be confiden= tial and is intended solely for the use of the named addressee.
> >> >
> >> > Access, copying or re-use of the e-mail or any informati= on contained therein by any other person is not authorized.
> >> >
> >> > If you are not the intended recipient please notify us i= mmediately by returning the e-mail to the originator.
> >> >
> >> > Disclaimer Version MB.US.1
>
> The information contained in this e-mail may be confidential and is in= tended solely for the use of the named addressee.
>
> Access, copying or re-use of the e-mail or any information contained t= herein by any other person is not authorized.
>
> If you are not the intended recipient please notify us immediately by = returning the e-mail to the originator.
>
> Disclaimer Version MB.US.1
--000000000000a75ad6059d625938--