arrow-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Wes McKinney <wesmck...@gmail.com>
Subject Re: Converting clickhouse column to arrow array
Date Wed, 22 Jan 2020 16:14:04 GMT
For the record, other people in the Arrow community have discussed
building an adapter for CH

https://issues.apache.org/jira/browse/ARROW-3156

It might be advisable to find others in the CH community who are
interested and build a shared solution -- this work would be welcome
inside Apache Arrow IMHO (and other database interfaces, too).

On Wed, Jan 22, 2020 at 10:12 AM Wes McKinney <wesmckinn@gmail.com> wrote:
>
> hi Matt,
>
> I recommend you use the visitor pattern combined with the
> arrow::TypeTraits that we provide
>
> https://github.com/apache/arrow/blob/master/cpp/src/arrow/type_traits.h
>
> You'll need to provide a compile-time mapping from Clickhouse types to
> Arrow types, but then you can statically access the correct builder
> type at compile time
>
> using ArrowType = typename CHToArrowType<CHType>::ArrowType;
> using BuilderType = typename TypeTraits<ArrowType>::BuilderType;
>
> ...
>
> or similar. In cases where the exported Clickhouse data does not have
> an associated AppendValues method in Arrow you may have to write a
> special case (please open JIRA issues if you think there should be
> more AppendValues methods)
>
> Thanks
>
> On Wed, Jan 22, 2020 at 7:44 AM Calder, Matthew <mcalder@xbktrading.com> wrote:
> >
> > Hi,
> >
> >
> >
> > I am interfacing arrow to a Clickhouse database using their c++ client. Both arrow
and CH have generic array-like classes with the element data type internalized. Ideally, I
would like to be able to write something like:
> >
> >
> >
> > arrow::Array a = SomeConversionInvocation(clickhouse::Column c);
> >
> >
> >
> > Where the array and column have the same element type (int, double, string, …)
but the code is generic to the specific type.
> >
> >
> >
> > I can do this by explicitly handling specific types through template specialization
but I thought that since arrow already has pretty generic type handling through its templates,
and clickhouse also has similar capability there ought to be a more seamless way to do the
conversion. Zero copy would probably be a lot to ask, but something short of template specializations
for every type is what I am aiming for.
> >
> >
> >
> > I currently do explicit type specialization. For example I have functions like:
> >
> >
> >
> > inline std::shared_ptr<arrow::Array> makeArray(const std::vector<double>
&v)
> >
> > {
> >
> >     arrow::DoubleBuilder builder;
> >
> >     builder.AppendValues(v);
> >
> >     std::shared_ptr<arrow::Array> array;
> >
> >     builder.Finish(&array);
> >
> >     return array;
> >
> > }
> >
> >
> >
> > inline std::shared_ptr<arrow::Array> makeArray(const std::vector<int>
&v)
> >
> > {
> >
> >     arrow::Int32Builder builder;
> >
> >     builder.AppendValues(v);
> >
> >     std::shared_ptr<arrow::Array> array;
> >
> >     builder.Finish(&array);
> >
> >     return array;
> >
> > }
> >
> >
> >
> > Which I suspect is unnecessarily explicit. Is there a more generic way of handling
the variety of underlying array element data types when constructing arrow::Array objects?
And can someone point me to examples that interface arrow to another similarly generically
typed library (doesn’t have to be clickhouse). Thanks for any guidance.
> >
> >
> >
> > Matt
> >
> >
> >
> >
> > The information contained in this e-mail may be confidential and is intended solely
for the use of the named addressee.
> >
> > Access, copying or re-use of the e-mail or any information contained therein by
any other person is not authorized.
> >
> > If you are not the intended recipient please notify us immediately by returning
the e-mail to the originator.
> >
> > Disclaimer Version MB.US.1

Mime
View raw message