arrow-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Micah Kornfield <emkornfi...@gmail.com>
Subject Re: Converting clickhouse column to arrow array
Date Thu, 30 Jan 2020 21:43:29 GMT
>
> (FWIW, we developed ArrayDataVisitor primarily for internal library
> use and not as a public API)
> I would personally try to first use VisitArrayInline if at all
> possible since it is simpler


Is VisitArrayInline meant to be for public use?  visitor_inline.h still has
the disclaimer "Private header, not to be exported".

Thanks,
Micah

On Wed, Jan 29, 2020 at 8:57 AM Wes McKinney <wesmckinn@gmail.com> wrote:

> On Wed, Jan 29, 2020 at 9:55 AM Calder, Matthew <mcalder@xbktrading.com>
> wrote:
> >
> > I managed to get conversion from CH to arrow using a CHToArrowType<>
> inter-type traits concept. However, I am still trying to crack the use of:
> >
> >  arrow::VisitArrayInline
>
> Here's a minimal example of VisitArrayInline
>
> struct ArrayVisitor {
>   Status Visit(const Array& arr) {
>     return Status::OK();
>   }
> };
>
> Status VisitArrayInlineExample(const Array& arr) {
>   ArrayVisitor visitor;
>   return VisitArrayInline(arr, &visitor);
> }
>
> You can add different Visit functions to match different specific
> Array subclasses or groups of types (e.g. integers, floating point,
> etc.). std::enable_if is helpful (and the various helper templates in
> arrow/type_traits.h)
>
> >
> > and
> >
> > arrow::ArrayDataVisitor
>
> Here's an example (didn't compile this, but hopefully this gives the idea)
>
> struct BooleanValueVisitor {
>   int64_t num_true = 0;
>   int64_t num_null = 0;
>
>   Status VisitNull() {
>     ++num_null;
>     return Status::OK();
>   }
>
>   Status VisitValue(bool value) {
>     if (value) ++num_true;
>     return Status::OK();
>   }
> };
>
>
> Status VisitBooleanValues(const Array& arr) {
>   BooleanValueVisitor visitor;
>   return ArrayDataVisitor<BooleanType>::Visit(*arr.data(), &visitor);
> }
>
> If you have a type-parameterized visitor, then you could have
>
> template <typename ArrowType>
> Status VisitArrayValues(const Array& arr) {
>   MyValueVisitor<ArrowType> visitor;
>   return ArrayDataVisitor<ArrowType>::Visit(*arr.data(), &visitor);
> }
>
> (FWIW, we developed ArrayDataVisitor primarily for internal library
> use and not as a public API)
>
> I would personally try to first use VisitArrayInline if at all
> possible since it is simpler
>
> >
> > I have a struct:
> >
> > Struct AnArrayUser
> > {
> >      template <typename T> arrow::Status Visit(const T &a)
> >      {
> >            // How to invoke ArrayDataVisitor?
> >      }
> >
> >      void Use(const arrow::Array &a) {arrow::VisitArrayInline(a, this);}
> >
> >
> >      arrow::Status VisitNull() {return arrow::Status::OK();}
> >      template <class T> arrow::Status VisitValue(T val) {return
> arrow::Status::OK();}
> > };
> >
> > Which appears to have it's "Use" method called appropriately. But inside
> of the Visit method I have so far been unable to find the incantation to
> make a call through the ArrayDataVisitor. I've tried several variations of:
> >
> > arrow::ArrayDataVisitor<typename T::TypeClass>::Visit(*(array.data()),
> this);
> >
> > at the // How to .. line above but can't seem to get it to work. I'm
> sure I just have some fundamental misunderstanding of how this is supposed
> to work. Can someone give me some guidance?
> >
> > Matt
> >
> >
> >
> > -----Original Message-----
> > From: Wes McKinney <wesmckinn@gmail.com>
> > Sent: Wednesday, January 22, 2020 12:03 PM
> > To: user@arrow.apache.org
> > Subject: Re: Converting clickhouse column to arrow array
> >
> > If you search for "VisitTypeInline" or "VisitArrayInline" in the C++
> codebase you can find numerous examples of where this is used
> >
> > On Wed, Jan 22, 2020 at 10:58 AM Thomas Buhrmann <
> thomas.buehrmann@gmail.com> wrote:
> > >
> > > Hi,
> > > I was looking for something similar, but didn't find a good example in
> the docs or the source code showing how to use the visitor pattern. It
> would be great, e.g., to have an example similar to the "Row to columnar
> conversion", showing a templated way to read arrow columns into C++ vectors
> using the visitor pattern, and without implementing a separate reader
> function for each arrow type. Would that be possible?
> > >
> > > Many thanks,
> > > Thomas
> > >
> > > On Wed, 22 Jan 2020 at 17:13, Wes McKinney <wesmckinn@gmail.com>
> wrote:
> > >>
> > >> hi Matt,
> > >>
> > >> I recommend you use the visitor pattern combined with the
> > >> arrow::TypeTraits that we provide
> > >>
> > >> https://clicktime.symantec.com/38JEFUTGByJzrxbCs1aM2Mn7Vc?u=https%3A%
> > >> 2F%2Fgithub.com%2Fapache%2Farrow%2Fblob%2Fmaster%2Fcpp%2Fsrc%2Farrow%
> > >> 2Ftype_traits.h
> > >>
> > >> You'll need to provide a compile-time mapping from Clickhouse types
> > >> to Arrow types, but then you can statically access the correct
> > >> builder type at compile time
> > >>
> > >> using ArrowType = typename CHToArrowType<CHType>::ArrowType; using
> > >> BuilderType = typename TypeTraits<ArrowType>::BuilderType;
> > >>
> > >> ...
> > >>
> > >> or similar. In cases where the exported Clickhouse data does not have
> > >> an associated AppendValues method in Arrow you may have to write a
> > >> special case (please open JIRA issues if you think there should be
> > >> more AppendValues methods)
> > >>
> > >> Thanks
> > >>
> > >> On Wed, Jan 22, 2020 at 7:44 AM Calder, Matthew <
> mcalder@xbktrading.com> wrote:
> > >> >
> > >> > Hi,
> > >> >
> > >> >
> > >> >
> > >> > I am interfacing arrow to a Clickhouse database using their c++
> client. Both arrow and CH have generic array-like classes with the element
> data type internalized. Ideally, I would like to be able to write something
> like:
> > >> >
> > >> >
> > >> >
> > >> > arrow::Array a = SomeConversionInvocation(clickhouse::Column c);
> > >> >
> > >> >
> > >> >
> > >> > Where the array and column have the same element type (int, double,
> string, …) but the code is generic to the specific type.
> > >> >
> > >> >
> > >> >
> > >> > I can do this by explicitly handling specific types through
> template specialization but I thought that since arrow already has pretty
> generic type handling through its templates, and clickhouse also has
> similar capability there ought to be a more seamless way to do the
> conversion. Zero copy would probably be a lot to ask, but something short
> of template specializations for every type is what I am aiming for.
> > >> >
> > >> >
> > >> >
> > >> > I currently do explicit type specialization. For example I have
> functions like:
> > >> >
> > >> >
> > >> >
> > >> > inline std::shared_ptr<arrow::Array> makeArray(const
> > >> > std::vector<double> &v)
> > >> >
> > >> > {
> > >> >
> > >> >     arrow::DoubleBuilder builder;
> > >> >
> > >> >     builder.AppendValues(v);
> > >> >
> > >> >     std::shared_ptr<arrow::Array> array;
> > >> >
> > >> >     builder.Finish(&array);
> > >> >
> > >> >     return array;
> > >> >
> > >> > }
> > >> >
> > >> >
> > >> >
> > >> > inline std::shared_ptr<arrow::Array> makeArray(const
> > >> > std::vector<int> &v)
> > >> >
> > >> > {
> > >> >
> > >> >     arrow::Int32Builder builder;
> > >> >
> > >> >     builder.AppendValues(v);
> > >> >
> > >> >     std::shared_ptr<arrow::Array> array;
> > >> >
> > >> >     builder.Finish(&array);
> > >> >
> > >> >     return array;
> > >> >
> > >> > }
> > >> >
> > >> >
> > >> >
> > >> > Which I suspect is unnecessarily explicit. Is there a more generic
> way of handling the variety of underlying array element data types when
> constructing arrow::Array objects? And can someone point me to examples
> that interface arrow to another similarly generically typed library
> (doesn’t have to be clickhouse). Thanks for any guidance.
> > >> >
> > >> >
> > >> >
> > >> > Matt
> > >> >
> > >> >
> > >> >
> > >> >
> > >> > The information contained in this e-mail may be confidential and is
> intended solely for the use of the named addressee.
> > >> >
> > >> > Access, copying or re-use of the e-mail or any information
> contained therein by any other person is not authorized.
> > >> >
> > >> > If you are not the intended recipient please notify us immediately
> by returning the e-mail to the originator.
> > >> >
> > >> > Disclaimer Version MB.US.1
> >
> > The information contained in this e-mail may be confidential and is
> intended solely for the use of the named addressee.
> >
> > Access, copying or re-use of the e-mail or any information contained
> therein by any other person is not authorized.
> >
> > If you are not the intended recipient please notify us immediately by
> returning the e-mail to the originator.
> >
> > Disclaimer Version MB.US.1
>

Mime
View raw message