From user-return-146-archive-asf-public=cust-asf.ponee.io@arrow.apache.org Tue Jun 25 01:36:40 2019 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [207.244.88.153]) by mx-eu-01.ponee.io (Postfix) with SMTP id 44359180671 for ; Tue, 25 Jun 2019 03:36:40 +0200 (CEST) Received: (qmail 70868 invoked by uid 500); 25 Jun 2019 01:36:39 -0000 Mailing-List: contact user-help@arrow.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@arrow.apache.org Delivered-To: mailing list user@arrow.apache.org Received: (qmail 70858 invoked by uid 99); 25 Jun 2019 01:36:39 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 25 Jun 2019 01:36:39 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id AD65B180F49 for ; Tue, 25 Jun 2019 01:36:38 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -0.201 X-Spam-Level: X-Spam-Status: No, score=-0.201 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd3-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id YbuESKHBMMKj for ; Tue, 25 Jun 2019 01:36:33 +0000 (UTC) Received: from mail-io1-f48.google.com (mail-io1-f48.google.com [209.85.166.48]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id 268925FDF2 for ; Tue, 25 Jun 2019 01:36:33 +0000 (UTC) Received: by mail-io1-f48.google.com with SMTP id k20so447083ios.10 for ; Mon, 24 Jun 2019 18:36:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :content-transfer-encoding; bh=30gbK0HAUfxKOBr3B5mdSTHjyrxlGofUbKoYCepPSVo=; b=Q5KfMYYohdxZSTfkHzBrMnXWYzC92i5JEirhm8stc0ZN2/7DCN5k0vtHTl6josjoMJ kzy2a6v2SPd6gwzLT7mZhni6PX/aBEYrP50g6+tOSy3SYkCo7XrjhPtmkeaQ7jfRzHnz VD+NBTUZyTPc7eQrsCSWHeBxkvBHJkMZt2whTMh6QssKf50PCyJZskkqp4Mcn8h80i1k 6tT4gXYJsiZjWIwhajInYj1hVn0R7LJD/MYdsIlFGct82FdV7Nr9XPYSoCKP2g9Zihnq pwO6+gpAVsRwjb1C80UV3rYyncC0I1gu1Yd7AldNLjaaJsXG4ViAzVxmHK5efoNAtNS4 ghgA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:content-transfer-encoding; bh=30gbK0HAUfxKOBr3B5mdSTHjyrxlGofUbKoYCepPSVo=; b=AjWaWeOLCo8E4VLxWSbVjWo7AoUxz+cPjY0o8y5jvHXSL3LnVBB1AjtAjnwQdeD6s1 ipTElAXpQE8X8DYo1tE48ueXR0sBwjmPtNQRCYlnKQplxcw/8fSeMFY14kP7qVwRMCYy 1j+jPj2Wr4mBCHwQpMo2cXEnEkBD7c99EIV15XorJ7EsOi9412loTkLTu8PZRP5Jaqiu Z6dAjGUgUB5UypHk6AtUrxo5kYiEuVE3zT2e2rjTf6yYjC8a6sqRi/5dTA2EGOyD5HF4 8epWQacJ5AqtSps9nmFLK29cJQh2FRq01MZs5r9OkBTKepCkpVRdnbvw+vs+ysNFIT+V wsDw== X-Gm-Message-State: APjAAAUMC6PufYUE+7CggvH9YLex9tLJhuJ9kkOTlOnbzKUN68hgN/eK ImBN1IzwKUodwH5C9iZGG4gxIKZjZebYcCRTP7CrYRMl X-Google-Smtp-Source: APXvYqwpjZGGlzrHI2kjnVgqxWjaqQn8ouPhlE/ELerDcdCW0+kGk8/zzwVh2mnvh9ePRYJFRbIG9bm9ENfjZ8PhSRc= X-Received: by 2002:a05:6638:29a:: with SMTP id c26mr24520081jaq.98.1561426585489; Mon, 24 Jun 2019 18:36:25 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Wes McKinney Date: Mon, 24 Jun 2019 20:35:48 -0500 Message-ID: Subject: Re: Does arrow streaming support splitting big string/binary data? To: user@arrow.apache.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable hi Ivan, Currently all implementations of Arrow treat record batch protocol messages are atomic entities, in the sense that IPC protocol readers expect to have access to the entire message in virtual address space. If Arrow protocol payloads need to be split on the wire, usually that's handled by the underlying transport layer. For example, in Flight (which uses gRPC as its default transport), gRPC breaks large messages into smaller buffers internally. - Wes On Mon, Jun 24, 2019 at 8:29 PM Ivan Popivanov w= rote: > > Hello, > > Looking at these examples and the documentation, it seems that a record b= atch cannot span multiple messages. Is my understanding correct? > > Here is the scenario I am considering: two columns, an int and a string. = Let assume that we want the maximum message size to be 64K. If there is a r= ow with a string value of let's say 70K, it has to span multiple batches. D= oes the current message format support this? > > If it doesn't, then another layer is needed to create the messages when a= column size is a multiple of the message size. > > Thanks > Ivan