arrow-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Thomas Buhrmann <thomas.buehrm...@gmail.com>
Subject Re: Converting clickhouse column to arrow array
Date Wed, 22 Jan 2020 16:58:05 GMT
Hi,
I was looking for something similar, but didn't find a good example in the
docs or the source code showing how to use the visitor pattern. It would be
great, e.g., to have an example similar to the "Row to columnar
conversion", showing a templated way to read arrow columns into C++ vectors
using the visitor pattern, and without implementing a separate reader
function for each arrow type. Would that be possible?

Many thanks,
Thomas

On Wed, 22 Jan 2020 at 17:13, Wes McKinney <wesmckinn@gmail.com> wrote:

> hi Matt,
>
> I recommend you use the visitor pattern combined with the
> arrow::TypeTraits that we provide
>
> https://github.com/apache/arrow/blob/master/cpp/src/arrow/type_traits.h
>
> You'll need to provide a compile-time mapping from Clickhouse types to
> Arrow types, but then you can statically access the correct builder
> type at compile time
>
> using ArrowType = typename CHToArrowType<CHType>::ArrowType;
> using BuilderType = typename TypeTraits<ArrowType>::BuilderType;
>
> ...
>
> or similar. In cases where the exported Clickhouse data does not have
> an associated AppendValues method in Arrow you may have to write a
> special case (please open JIRA issues if you think there should be
> more AppendValues methods)
>
> Thanks
>
> On Wed, Jan 22, 2020 at 7:44 AM Calder, Matthew <mcalder@xbktrading.com>
> wrote:
> >
> > Hi,
> >
> >
> >
> > I am interfacing arrow to a Clickhouse database using their c++ client.
> Both arrow and CH have generic array-like classes with the element data
> type internalized. Ideally, I would like to be able to write something like:
> >
> >
> >
> > arrow::Array a = SomeConversionInvocation(clickhouse::Column c);
> >
> >
> >
> > Where the array and column have the same element type (int, double,
> string, …) but the code is generic to the specific type.
> >
> >
> >
> > I can do this by explicitly handling specific types through template
> specialization but I thought that since arrow already has pretty generic
> type handling through its templates, and clickhouse also has similar
> capability there ought to be a more seamless way to do the conversion. Zero
> copy would probably be a lot to ask, but something short of template
> specializations for every type is what I am aiming for.
> >
> >
> >
> > I currently do explicit type specialization. For example I have
> functions like:
> >
> >
> >
> > inline std::shared_ptr<arrow::Array> makeArray(const std::vector<double>
> &v)
> >
> > {
> >
> >     arrow::DoubleBuilder builder;
> >
> >     builder.AppendValues(v);
> >
> >     std::shared_ptr<arrow::Array> array;
> >
> >     builder.Finish(&array);
> >
> >     return array;
> >
> > }
> >
> >
> >
> > inline std::shared_ptr<arrow::Array> makeArray(const std::vector<int>
&v)
> >
> > {
> >
> >     arrow::Int32Builder builder;
> >
> >     builder.AppendValues(v);
> >
> >     std::shared_ptr<arrow::Array> array;
> >
> >     builder.Finish(&array);
> >
> >     return array;
> >
> > }
> >
> >
> >
> > Which I suspect is unnecessarily explicit. Is there a more generic way
> of handling the variety of underlying array element data types when
> constructing arrow::Array objects? And can someone point me to examples
> that interface arrow to another similarly generically typed library
> (doesn’t have to be clickhouse). Thanks for any guidance.
> >
> >
> >
> > Matt
> >
> >
> >
> >
> > The information contained in this e-mail may be confidential and is
> intended solely for the use of the named addressee.
> >
> > Access, copying or re-use of the e-mail or any information contained
> therein by any other person is not authorized.
> >
> > If you are not the intended recipient please notify us immediately by
> returning the e-mail to the originator.
> >
> > Disclaimer Version MB.US.1
>

Mime
View raw message