From dev-return-15793-archive-asf-public=cust-asf.ponee.io@arrow.apache.org Sat Nov 2 02:49:05 2019 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [207.244.88.153]) by mx-eu-01.ponee.io (Postfix) with SMTP id 38133180626 for ; Sat, 2 Nov 2019 03:49:05 +0100 (CET) Received: (qmail 52427 invoked by uid 500); 2 Nov 2019 02:49:02 -0000 Mailing-List: contact dev-help@arrow.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@arrow.apache.org Delivered-To: mailing list dev@arrow.apache.org Received: (qmail 52405 invoked by uid 99); 2 Nov 2019 02:49:01 -0000 Received: from mailrelay1-us-west.apache.org (HELO mailrelay1-us-west.apache.org) (209.188.14.139) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 02 Nov 2019 02:49:01 +0000 Received: from jira-he-de.apache.org (static.172.67.40.188.clients.your-server.de [188.40.67.172]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id ADD71E300F for ; Sat, 2 Nov 2019 02:49:00 +0000 (UTC) Received: from jira-he-de.apache.org (localhost.localdomain [127.0.0.1]) by jira-he-de.apache.org (ASF Mail Server at jira-he-de.apache.org) with ESMTP id 0F884780661 for ; Sat, 2 Nov 2019 02:49:00 +0000 (UTC) Date: Sat, 2 Nov 2019 02:49:00 +0000 (UTC) From: "Yogesh Tewari (Jira)" To: dev@arrow.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Created] (ARROW-7048) [Java] Support for combining multiple vectors under VectorSchemaRoot MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 Yogesh Tewari created ARROW-7048: ------------------------------------ Summary: [Java] Support for combining multiple vectors under V= ectorSchemaRoot Key: ARROW-7048 URL: https://issues.apache.org/jira/browse/ARROW-7048 Project: Apache Arrow Issue Type: New Feature Components: Java Reporter: Yogesh Tewari Hi, =C2=A0 pyarrow.Table.combine_chunks provides a nice functionality of combining mul= tiple batch records under a single=C2=A0pyarrow.Table. =C2=A0 I am currently working on a downstream application which reads data from Bi= gQuery. BigQuery storage api supports data output in Arrow format but strea= ms data in many batches of size 1024 or less number of rows. It would be really nice to have Arrow Java api provide this functionality u= nder an abstraction like VectorSchemaRoot. After getting guidance from [~emkornfield@gmail.com], I tried to write my o= wn implementation by copying data vector by vector using=C2=A0TransferPair'= s copyValueSafe But, unless I am missing some thing obvious, turns out it only copies one v= alue at a time. That means a lot of looping trying=C2=A0copyValueSafe milli= ons of rows from source vector index to target vector index. Ideally I woul= d want to concatenate/link the underlying buffers rather than copying one c= ell at a time. =C2=A0 Eg, if I have : {code:java} List batchList =3D new ArrayList<>(); try (ArrowStreamReader reader =3D new ArrowStreamReader(new ByteArrayInputS= tream(out.toByteArray()), allocator)) { Schema schema =3D reader.getVectorSchemaRoot().getSchema(); for (int i =3D 0; i < 5; i++) { // This will be loaded with new values on every call to loadNextBat= ch VectorSchemaRoot readBatch =3D reader.getVectorSchemaRoot(); reader.loadNextBatch(); batchList.add(readBatch); } } //VectorSchemaRoot.combineChunks(batchList, newVectorSchemaRoot);{code} =C2=A0 A method like VectorSchemaRoot.combineChunks(List)? I did read the VectorSchemaRoot discussion on=C2=A0https://issues.apache.or= g/jira/browse/ARROW-6896=C2=A0and am not sure if its the right thing to use= here. =C2=A0 =C2=A0 PS. Feel free to update the title of this feature request to more appropria= te wordings. =C2=A0 Cheers, Yogesh =C2=A0 =C2=A0 -- This message was sent by Atlassian Jira (v8.3.4#803005)