From user-return-915-archive-asf-public=cust-asf.ponee.io@arrow.apache.org Fri Jan 15 15:38:48 2021 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mxout1-ec2-va.apache.org (mxout1-ec2-va.apache.org [3.227.148.255]) by mx-eu-01.ponee.io (Postfix) with ESMTPS id BE09F180654 for ; Fri, 15 Jan 2021 16:38:48 +0100 (CET) Received: from mail.apache.org (mailroute1-lw-us.apache.org [207.244.88.153]) by mxout1-ec2-va.apache.org (ASF Mail Server at mxout1-ec2-va.apache.org) with SMTP id 58E6346403 for ; Fri, 15 Jan 2021 15:38:39 +0000 (UTC) Received: (qmail 11424 invoked by uid 500); 15 Jan 2021 15:38:38 -0000 Mailing-List: contact user-help@arrow.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@arrow.apache.org Delivered-To: mailing list user@arrow.apache.org Received: (qmail 11414 invoked by uid 99); 15 Jan 2021 15:38:38 -0000 Received: from spamproc1-he-fi.apache.org (HELO spamproc1-he-fi.apache.org) (95.217.134.168) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 15 Jan 2021 15:38:38 +0000 Received: from localhost (localhost [127.0.0.1]) by spamproc1-he-fi.apache.org (ASF Mail Server at spamproc1-he-fi.apache.org) with ESMTP id EE772C0424 for ; Fri, 15 Jan 2021 15:38:37 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamproc1-he-fi.apache.org X-Spam-Flag: NO X-Spam-Score: -0.202 X-Spam-Level: X-Spam-Status: No, score=-0.202 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_MSPIKE_H2=-0.001, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamproc1-he-fi.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-ec2-va.apache.org ([116.203.227.195]) by localhost (spamproc1-he-fi.apache.org [95.217.134.168]) (amavisd-new, port 10024) with ESMTP id trB8Ek71NdDq for ; Fri, 15 Jan 2021 15:38:37 +0000 (UTC) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=209.85.208.178; helo=mail-lj1-f178.google.com; envelope-from=wesmckinn@gmail.com; receiver= Received: from mail-lj1-f178.google.com (mail-lj1-f178.google.com [209.85.208.178]) by mx1-ec2-va.apache.org (ASF Mail Server at mx1-ec2-va.apache.org) with ESMTPS id C111EBCC60 for ; Fri, 15 Jan 2021 15:38:36 +0000 (UTC) Received: by mail-lj1-f178.google.com with SMTP id f17so10799516ljg.12 for ; Fri, 15 Jan 2021 07:38:36 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to; bh=trtJAh8gXx3LTi+463InKxOc6Eh4oci5io779Kb15IA=; b=rKY/EOD+vcwQF0cqaOZUgcYYT77UBzIKda5iyAbGK6xYEawZM6iSfUvDnhLBTul3bz zNwptp0WxgajGwZ5LGsykGh6eSX1tVUXY5OrcepcusQ6yePQHKlsOcu8gd9sudFiZOTu jdrf/pEaYqiDTunPtmI02mMp2hij9zCstIUUDIzK6UhzR83um3/8E48D2TR5+KkCdpY1 QhM2rUdklLfkXPXNMQQd2FwNA2MEGuilPiNh/raacH/adEqk62RZy0rhQFgsySdAr3kN DvAdXmph08H1XDxOMEfXjoItBGifwkSSfxzy320b7RrqhB2Y7ELctUV+FCoEXnN+MBzC G9kQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to; bh=trtJAh8gXx3LTi+463InKxOc6Eh4oci5io779Kb15IA=; b=EEBwGZx4lDJ0JlbAbIxxdfBv/sMzZ2tuYL4T/zy7aNmXBE36W86n1VR+7Go3Ov6hLn a5ee1u05wSLJ+5/Bzk+gL5SGdF+p4Jpu92fj8yXQl06KfbnHKj4VIT42aUdc6wcujV1e g44ZE8KSFd9CQ0XCwyk1KKtvU/CwDQlvGYr1dNKftulsYUXnRVi7kp01+AZVXSHHcBkD V6sOnzSuqNPV2iyvQ998fCG/GBJfKUcaH+cTSJs1QQBd0YSSFAHuA8NjjXhXbn5U6yPu A720u5UVrSSKjT8HOLkaY7VOqKIuR7cuOFW1Wd6jP5w/MgUonw5VB1e2HUyZB6hxvyk0 O3uA== X-Gm-Message-State: AOAM531ae+/XAs22T3XGgiSsUUXRjElsPu/TvGNf2Ds4vu1ZBEp+T/4a sx2Ob1wvXpJZySVa3mtfcFa+Xucl+vJEBaqQaYuXMqB2PVzcEA== X-Google-Smtp-Source: ABdhPJzpbhGPUEWKWvRIFGLSwHSGEFa67qAEDvLRgdUqisev5wJLYhtLJH9URTuLso9mLy1Cp8RNpPPg7WjblIjHSSY= X-Received: by 2002:a2e:8602:: with SMTP id a2mr5291245lji.421.1610725109869; Fri, 15 Jan 2021 07:38:29 -0800 (PST) MIME-Version: 1.0 References: In-Reply-To: From: Wes McKinney Date: Fri, 15 Jan 2021 09:37:53 -0600 Message-ID: Subject: Re: compute::Take & ChunkedArrays To: user@arrow.apache.org Content-Type: text/plain; charset="UTF-8" You can do that, but note that the implementation is currently not efficient, see https://github.com/apache/arrow/blob/master/cpp/src/arrow/compute/kernels/vector_selection.cc#L1909 Rather than pre-concatenating the chunks (which can easily fail) and then invoking Take on the resulting concatenated Array, it would be better to do a O(N log K) take on the chunks directly, where N is the number of take indices and K is the number of chunks. For example, if you have chunks of size 10 50 100 20 then the algorithm computes the following offset table: 0 10 60 160 180 Indices relative to the whole ChunkedArray are translated to (chunk number, intrachunk index), for example: take with [5, 40, 100, 170] is translated by doing binary searches in the offset table to: (chunk=0, relative_index=5) (1, 30) (2, 40) (3, 10) Consecutive indices from the same chunk are batched together and then Take is invoked on the respective chunk (with boundschecking disabled) to select a chunk for the resulting output ChunkedArray. Might be helpful to copy this to the appropriate Jira (I'm sure there is one already) to assist the person who implements this. Thanks, Wes On Mon, Jan 11, 2021 at 10:01 AM Niranda Perera wrote: > > Hi all, > > I was wondering how the Take API works with ChunkedArrays? > ex: If we have a ChunkedArray[100] with Array1[50] and Array2[50] > so, if I want an element from each array, can I pass something like [10, 60] as the indices? > > -- > Niranda Perera > @n1r44 > +1 812 558 8884 / +94 71 554 8430 > https://www.linkedin.com/in/niranda