From user-return-283-archive-asf-public=cust-asf.ponee.io@arrow.apache.org Wed Jan 22 16:14:44 2020 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [207.244.88.153]) by mx-eu-01.ponee.io (Postfix) with SMTP id 41F0418064C for ; Wed, 22 Jan 2020 17:14:44 +0100 (CET) Received: (qmail 78070 invoked by uid 500); 22 Jan 2020 16:14:43 -0000 Mailing-List: contact user-help@arrow.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@arrow.apache.org Delivered-To: mailing list user@arrow.apache.org Received: (qmail 78060 invoked by uid 99); 22 Jan 2020 16:14:43 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 22 Jan 2020 16:14:43 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id 1B3BBC0826 for ; Wed, 22 Jan 2020 16:14:43 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -0.2 X-Spam-Level: X-Spam-Status: No, score=-0.2 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd4-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-ec2-va.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id RxrJvy5emCnb for ; Wed, 22 Jan 2020 16:14:41 +0000 (UTC) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=209.85.166.53; helo=mail-io1-f53.google.com; envelope-from=wesmckinn@gmail.com; receiver= Received: from mail-io1-f53.google.com (mail-io1-f53.google.com [209.85.166.53]) by mx1-ec2-va.apache.org (ASF Mail Server at mx1-ec2-va.apache.org) with ESMTPS id 56BC5BC5B7 for ; Wed, 22 Jan 2020 16:14:41 +0000 (UTC) Received: by mail-io1-f53.google.com with SMTP id z8so7207037ioh.0 for ; Wed, 22 Jan 2020 08:14:41 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :content-transfer-encoding; bh=khZ/TIxf8fshHRQTp4eyVjnKuYy7UHNyzNAq51FPRHE=; b=BwmkHFM6lXhYt598Y9XJe4jQzhyiaKaENVgFBekfCrsULMuB1s503k6BQTVuOEnZWF DN2mzZFwGPvUAqgIX5qVe+U/Uc7p3UFFCGBlI4qgICWnbo4NMdp5haEqF3Paru/z12We c3T1A3Gpmi57DzgdMYws/RihkFkTszDFLvmFUzuRy7l2ZxGVI35e+tFv01AUvGasDdt/ Tf3QuyAAq3rrfkZHLVwjYIo8RwYfpNlGWqo6vhbJ+ohpaOdeQ+TB9csVAmfrtxGwuzqq LrftLKRyyMAd4sxNSPE9ExjolczUBW5Zb4KB+DByk6F00z0QM82cc4ospv6r1U9UzoCt c5DQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:content-transfer-encoding; bh=khZ/TIxf8fshHRQTp4eyVjnKuYy7UHNyzNAq51FPRHE=; b=uWMfDPrWz/ikZSC5DmbZ6VaHDiQU2ZsNuof/3p8E8x17SAawc3nKgbuIVnhiU3bjo4 5pNpg1ls8+Dm5bQ6WkIlWAyJPvl6CAxJtB1LicF3W7RR6tZHbCKQiL+IUd5BLNfpCGh1 pKlMMlTd6+Jr65wZs3S05vuIf2lTI+oVJPSbHw4MCpmqQl+YD6aFgHgvFsvSV8Lu1Mh5 UNY2iKXDKI38jgKoBA0QdPiRoeluSfi2LRPMghEVlEw410WHBh51kT/8+dis4G4/Aw0/ NdyPl0A6HKJ1gSHVs8uhbXU0VUl6zVNz7KwWBCGg8frBdI+Idl9kZb5JmACSHf6xayAm eaWA== X-Gm-Message-State: APjAAAV587ogfb1sGLEFjL9wVTmOwITob2o8pfICyh/rnLGbqdINwE7J bebpNjH0AH12K2hIVbx5DQAt2mVBwo6FBgDgHtYDbFJx9ps= X-Google-Smtp-Source: APXvYqw11No3Cxp4aFMfBlW3xAuFNaIYyQ/eNmTTyLd4G9tXTl7Tuol2qR59tVDncEu810vEyDsxRW0qtg7Y5NMsOwU= X-Received: by 2002:a5e:9748:: with SMTP id h8mr7563010ioq.121.1579709680504; Wed, 22 Jan 2020 08:14:40 -0800 (PST) MIME-Version: 1.0 References: In-Reply-To: From: Wes McKinney Date: Wed, 22 Jan 2020 10:14:04 -0600 Message-ID: Subject: Re: Converting clickhouse column to arrow array To: user@arrow.apache.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable For the record, other people in the Arrow community have discussed building an adapter for CH https://issues.apache.org/jira/browse/ARROW-3156 It might be advisable to find others in the CH community who are interested and build a shared solution -- this work would be welcome inside Apache Arrow IMHO (and other database interfaces, too). On Wed, Jan 22, 2020 at 10:12 AM Wes McKinney wrote: > > hi Matt, > > I recommend you use the visitor pattern combined with the > arrow::TypeTraits that we provide > > https://github.com/apache/arrow/blob/master/cpp/src/arrow/type_traits.h > > You'll need to provide a compile-time mapping from Clickhouse types to > Arrow types, but then you can statically access the correct builder > type at compile time > > using ArrowType =3D typename CHToArrowType::ArrowType; > using BuilderType =3D typename TypeTraits::BuilderType; > > ... > > or similar. In cases where the exported Clickhouse data does not have > an associated AppendValues method in Arrow you may have to write a > special case (please open JIRA issues if you think there should be > more AppendValues methods) > > Thanks > > On Wed, Jan 22, 2020 at 7:44 AM Calder, Matthew = wrote: > > > > Hi, > > > > > > > > I am interfacing arrow to a Clickhouse database using their c++ client.= Both arrow and CH have generic array-like classes with the element data ty= pe internalized. Ideally, I would like to be able to write something like: > > > > > > > > arrow::Array a =3D SomeConversionInvocation(clickhouse::Column c); > > > > > > > > Where the array and column have the same element type (int, double, str= ing, =E2=80=A6) but the code is generic to the specific type. > > > > > > > > I can do this by explicitly handling specific types through template sp= ecialization but I thought that since arrow already has pretty generic type= handling through its templates, and clickhouse also has similar capability= there ought to be a more seamless way to do the conversion. Zero copy woul= d probably be a lot to ask, but something short of template specializations= for every type is what I am aiming for. > > > > > > > > I currently do explicit type specialization. For example I have functio= ns like: > > > > > > > > inline std::shared_ptr makeArray(const std::vector &v) > > > > { > > > > arrow::DoubleBuilder builder; > > > > builder.AppendValues(v); > > > > std::shared_ptr array; > > > > builder.Finish(&array); > > > > return array; > > > > } > > > > > > > > inline std::shared_ptr makeArray(const std::vector &= v) > > > > { > > > > arrow::Int32Builder builder; > > > > builder.AppendValues(v); > > > > std::shared_ptr array; > > > > builder.Finish(&array); > > > > return array; > > > > } > > > > > > > > Which I suspect is unnecessarily explicit. Is there a more generic way = of handling the variety of underlying array element data types when constru= cting arrow::Array objects? And can someone point me to examples that inter= face arrow to another similarly generically typed library (doesn=E2=80=99t = have to be clickhouse). Thanks for any guidance. > > > > > > > > Matt > > > > > > > > > > The information contained in this e-mail may be confidential and is int= ended solely for the use of the named addressee. > > > > Access, copying or re-use of the e-mail or any information contained th= erein by any other person is not authorized. > > > > If you are not the intended recipient please notify us immediately by r= eturning the e-mail to the originator. > > > > Disclaimer Version MB.US.1