arrow-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Calder, Matthew" <mcal...@xbktrading.com>
Subject RE: Converting clickhouse column to arrow array
Date Fri, 31 Jan 2020 11:18:17 GMT
Thanks for the help, I am using VisitArrayInline to good effect. Once the development solidifies
a bit more I'll post here and to CH's github page. If anyone is interested in collaboration
on CH <-> arrow tooling, I'm happy to help. 

Matt

-----Original Message-----
From: Wes McKinney <wesmckinn@gmail.com> 
Sent: Thursday, January 30, 2020 5:19 PM
To: user@arrow.apache.org; Micah Kornfield <emkornfield@gmail.com>
Subject: Re: Converting clickhouse column to arrow array

On Thu, Jan 30, 2020 at 3:43 PM Micah Kornfield <emkornfield@gmail.com> wrote:
>>
>> (FWIW, we developed ArrayDataVisitor primarily for internal library 
>> use and not as a public API) I would personally try to first use 
>> VisitArrayInline if at all possible since it is simpler
>
>
> Is VisitArrayInline meant to be for public use?  visitor_inline.h still has the disclaimer
"Private header, not to be exported".

I think it should be fine for public use -- we should amend the documentation. Using the "inline"
version is simpler in many ways from the virtual Visitor when you have a templated Visit function
that matches many type cases.

> Thanks,
> Micah
>
> On Wed, Jan 29, 2020 at 8:57 AM Wes McKinney <wesmckinn@gmail.com> wrote:
>>
>> On Wed, Jan 29, 2020 at 9:55 AM Calder, Matthew <mcalder@xbktrading.com> wrote:
>> >
>> > I managed to get conversion from CH to arrow using a CHToArrowType<> inter-type
traits concept. However, I am still trying to crack the use of:
>> >
>> >  arrow::VisitArrayInline
>>
>> Here's a minimal example of VisitArrayInline
>>
>> struct ArrayVisitor {
>>   Status Visit(const Array& arr) {
>>     return Status::OK();
>>   }
>> };
>>
>> Status VisitArrayInlineExample(const Array& arr) {
>>   ArrayVisitor visitor;
>>   return VisitArrayInline(arr, &visitor); }
>>
>> You can add different Visit functions to match different specific 
>> Array subclasses or groups of types (e.g. integers, floating point, 
>> etc.). std::enable_if is helpful (and the various helper templates in
>> arrow/type_traits.h)
>>
>> >
>> > and
>> >
>> > arrow::ArrayDataVisitor
>>
>> Here's an example (didn't compile this, but hopefully this gives the 
>> idea)
>>
>> struct BooleanValueVisitor {
>>   int64_t num_true = 0;
>>   int64_t num_null = 0;
>>
>>   Status VisitNull() {
>>     ++num_null;
>>     return Status::OK();
>>   }
>>
>>   Status VisitValue(bool value) {
>>     if (value) ++num_true;
>>     return Status::OK();
>>   }
>> };
>>
>>
>> Status VisitBooleanValues(const Array& arr) {
>>   BooleanValueVisitor visitor;
>>   return ArrayDataVisitor<BooleanType>::Visit(*arr.data(), &visitor); 
>> }
>>
>> If you have a type-parameterized visitor, then you could have
>>
>> template <typename ArrowType>
>> Status VisitArrayValues(const Array& arr) {
>>   MyValueVisitor<ArrowType> visitor;
>>   return ArrayDataVisitor<ArrowType>::Visit(*arr.data(), &visitor); }
>>
>> (FWIW, we developed ArrayDataVisitor primarily for internal library 
>> use and not as a public API)
>>
>> I would personally try to first use VisitArrayInline if at all 
>> possible since it is simpler
>>
>> >
>> > I have a struct:
>> >
>> > Struct AnArrayUser
>> > {
>> >      template <typename T> arrow::Status Visit(const T &a)
>> >      {
>> >            // How to invoke ArrayDataVisitor?
>> >      }
>> >
>> >      void Use(const arrow::Array &a) {arrow::VisitArrayInline(a, 
>> > this);}
>> >
>> >
>> >      arrow::Status VisitNull() {return arrow::Status::OK();}
>> >      template <class T> arrow::Status VisitValue(T val) {return 
>> > arrow::Status::OK();} };
>> >
>> > Which appears to have it's "Use" method called appropriately. But inside of
the Visit method I have so far been unable to find the incantation to make a call through
the ArrayDataVisitor. I've tried several variations of:
>> >
>> > arrow::ArrayDataVisitor<typename 
>> > T::TypeClass>::Visit(*(array.data()), this);
>> >
>> > at the // How to .. line above but can't seem to get it to work. I'm sure I
just have some fundamental misunderstanding of how this is supposed to work. Can someone give
me some guidance?
>> >
>> > Matt
>> >
>> >
>> >
>> > -----Original Message-----
>> > From: Wes McKinney <wesmckinn@gmail.com>
>> > Sent: Wednesday, January 22, 2020 12:03 PM
>> > To: user@arrow.apache.org
>> > Subject: Re: Converting clickhouse column to arrow array
>> >
>> > If you search for "VisitTypeInline" or "VisitArrayInline" in the 
>> > C++ codebase you can find numerous examples of where this is used
>> >
>> > On Wed, Jan 22, 2020 at 10:58 AM Thomas Buhrmann <thomas.buehrmann@gmail.com>
wrote:
>> > >
>> > > Hi,
>> > > I was looking for something similar, but didn't find a good example in
the docs or the source code showing how to use the visitor pattern. It would be great, e.g.,
to have an example similar to the "Row to columnar conversion", showing a templated way to
read arrow columns into C++ vectors using the visitor pattern, and without implementing a
separate reader function for each arrow type. Would that be possible?
>> > >
>> > > Many thanks,
>> > > Thomas
>> > >
>> > > On Wed, 22 Jan 2020 at 17:13, Wes McKinney <wesmckinn@gmail.com>
wrote:
>> > >>
>> > >> hi Matt,
>> > >>
>> > >> I recommend you use the visitor pattern combined with the 
>> > >> arrow::TypeTraits that we provide
>> > >>
>> > >> https://clicktime.symantec.com/3GJ1w2tHoLMJYMSJCcxgjL17Vc?u=http
>> > >> s%3A%2F%2Fclicktime.symantec.com%2F38JEFUTGByJzrxbCs1aM2Mn7Vc%3F
>> > >> u%3Dhttps%253A%25 
>> > >> 2F%2Fgithub.com%2Fapache%2Farrow%2Fblob%2Fmaster%2Fcpp%2Fsrc%2Fa
>> > >> rrow%
>> > >> 2Ftype_traits.h
>> > >>
>> > >> You'll need to provide a compile-time mapping from Clickhouse 
>> > >> types to Arrow types, but then you can statically access the 
>> > >> correct builder type at compile time
>> > >>
>> > >> using ArrowType = typename CHToArrowType<CHType>::ArrowType;

>> > >> using BuilderType = typename TypeTraits<ArrowType>::BuilderType;
>> > >>
>> > >> ...
>> > >>
>> > >> or similar. In cases where the exported Clickhouse data does not 
>> > >> have an associated AppendValues method in Arrow you may have to 
>> > >> write a special case (please open JIRA issues if you think there 
>> > >> should be more AppendValues methods)
>> > >>
>> > >> Thanks
>> > >>
>> > >> On Wed, Jan 22, 2020 at 7:44 AM Calder, Matthew <mcalder@xbktrading.com>
wrote:
>> > >> >
>> > >> > Hi,
>> > >> >
>> > >> >
>> > >> >
>> > >> > I am interfacing arrow to a Clickhouse database using their c++
client. Both arrow and CH have generic array-like classes with the element data type internalized.
Ideally, I would like to be able to write something like:
>> > >> >
>> > >> >
>> > >> >
>> > >> > arrow::Array a = SomeConversionInvocation(clickhouse::Column 
>> > >> > c);
>> > >> >
>> > >> >
>> > >> >
>> > >> > Where the array and column have the same element type (int, double,
string, …) but the code is generic to the specific type.
>> > >> >
>> > >> >
>> > >> >
>> > >> > I can do this by explicitly handling specific types through template
specialization but I thought that since arrow already has pretty generic type handling through
its templates, and clickhouse also has similar capability there ought to be a more seamless
way to do the conversion. Zero copy would probably be a lot to ask, but something short of
template specializations for every type is what I am aiming for.
>> > >> >
>> > >> >
>> > >> >
>> > >> > I currently do explicit type specialization. For example I have
functions like:
>> > >> >
>> > >> >
>> > >> >
>> > >> > inline std::shared_ptr<arrow::Array> makeArray(const 
>> > >> > std::vector<double> &v)
>> > >> >
>> > >> > {
>> > >> >
>> > >> >     arrow::DoubleBuilder builder;
>> > >> >
>> > >> >     builder.AppendValues(v);
>> > >> >
>> > >> >     std::shared_ptr<arrow::Array> array;
>> > >> >
>> > >> >     builder.Finish(&array);
>> > >> >
>> > >> >     return array;
>> > >> >
>> > >> > }
>> > >> >
>> > >> >
>> > >> >
>> > >> > inline std::shared_ptr<arrow::Array> makeArray(const 
>> > >> > std::vector<int> &v)
>> > >> >
>> > >> > {
>> > >> >
>> > >> >     arrow::Int32Builder builder;
>> > >> >
>> > >> >     builder.AppendValues(v);
>> > >> >
>> > >> >     std::shared_ptr<arrow::Array> array;
>> > >> >
>> > >> >     builder.Finish(&array);
>> > >> >
>> > >> >     return array;
>> > >> >
>> > >> > }
>> > >> >
>> > >> >
>> > >> >
>> > >> > Which I suspect is unnecessarily explicit. Is there a more generic
way of handling the variety of underlying array element data types when constructing arrow::Array
objects? And can someone point me to examples that interface arrow to another similarly generically
typed library (doesn’t have to be clickhouse). Thanks for any guidance.
>> > >> >
>> > >> >
>> > >> >
>> > >> > Matt
>> > >> >
>> > >> >
>> > >> >
>> > >> >
>> > >> > The information contained in this e-mail may be confidential and
is intended solely for the use of the named addressee.
>> > >> >
>> > >> > Access, copying or re-use of the e-mail or any information contained
therein by any other person is not authorized.
>> > >> >
>> > >> > If you are not the intended recipient please notify us immediately
by returning the e-mail to the originator.
>> > >> >
>> > >> > Disclaimer Version MB.US.1
>> >
>> > The information contained in this e-mail may be confidential and is intended
solely for the use of the named addressee.
>> >
>> > Access, copying or re-use of the e-mail or any information contained therein
by any other person is not authorized.
>> >
>> > If you are not the intended recipient please notify us immediately by returning
the e-mail to the originator.
>> >
>> > Disclaimer Version MB.US.1

The information contained in this e-mail may be confidential and is intended solely for the
use of the named addressee.

Access, copying or re-use of the e-mail or any information contained therein by any other
person is not authorized.

If you are not the intended recipient please notify us immediately by returning the e-mail
to the originator.

Disclaimer Version MB.US.1
Mime
View raw message