arrow-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Wes McKinney <wesmck...@gmail.com>
Subject Re: Converting clickhouse column to arrow array
Date Wed, 22 Jan 2020 16:12:34 GMT
hi Matt,

I recommend you use the visitor pattern combined with the
arrow::TypeTraits that we provide

https://github.com/apache/arrow/blob/master/cpp/src/arrow/type_traits.h

You'll need to provide a compile-time mapping from Clickhouse types to
Arrow types, but then you can statically access the correct builder
type at compile time

using ArrowType = typename CHToArrowType<CHType>::ArrowType;
using BuilderType = typename TypeTraits<ArrowType>::BuilderType;

...

or similar. In cases where the exported Clickhouse data does not have
an associated AppendValues method in Arrow you may have to write a
special case (please open JIRA issues if you think there should be
more AppendValues methods)

Thanks

On Wed, Jan 22, 2020 at 7:44 AM Calder, Matthew <mcalder@xbktrading.com> wrote:
>
> Hi,
>
>
>
> I am interfacing arrow to a Clickhouse database using their c++ client. Both arrow and
CH have generic array-like classes with the element data type internalized. Ideally, I would
like to be able to write something like:
>
>
>
> arrow::Array a = SomeConversionInvocation(clickhouse::Column c);
>
>
>
> Where the array and column have the same element type (int, double, string, …) but
the code is generic to the specific type.
>
>
>
> I can do this by explicitly handling specific types through template specialization but
I thought that since arrow already has pretty generic type handling through its templates,
and clickhouse also has similar capability there ought to be a more seamless way to do the
conversion. Zero copy would probably be a lot to ask, but something short of template specializations
for every type is what I am aiming for.
>
>
>
> I currently do explicit type specialization. For example I have functions like:
>
>
>
> inline std::shared_ptr<arrow::Array> makeArray(const std::vector<double>
&v)
>
> {
>
>     arrow::DoubleBuilder builder;
>
>     builder.AppendValues(v);
>
>     std::shared_ptr<arrow::Array> array;
>
>     builder.Finish(&array);
>
>     return array;
>
> }
>
>
>
> inline std::shared_ptr<arrow::Array> makeArray(const std::vector<int> &v)
>
> {
>
>     arrow::Int32Builder builder;
>
>     builder.AppendValues(v);
>
>     std::shared_ptr<arrow::Array> array;
>
>     builder.Finish(&array);
>
>     return array;
>
> }
>
>
>
> Which I suspect is unnecessarily explicit. Is there a more generic way of handling the
variety of underlying array element data types when constructing arrow::Array objects? And
can someone point me to examples that interface arrow to another similarly generically typed
library (doesn’t have to be clickhouse). Thanks for any guidance.
>
>
>
> Matt
>
>
>
>
> The information contained in this e-mail may be confidential and is intended solely for
the use of the named addressee.
>
> Access, copying or re-use of the e-mail or any information contained therein by any other
person is not authorized.
>
> If you are not the intended recipient please notify us immediately by returning the e-mail
to the originator.
>
> Disclaimer Version MB.US.1

Mime
View raw message