Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 4935C200C8C for ; Tue, 6 Jun 2017 23:15:23 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 47E37160BD3; Tue, 6 Jun 2017 21:15:23 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 90151160BB7 for ; Tue, 6 Jun 2017 23:15:22 +0200 (CEST) Received: (qmail 11812 invoked by uid 500); 6 Jun 2017 21:15:21 -0000 Mailing-List: contact issues-help@spark.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@spark.apache.org Received: (qmail 11803 invoked by uid 99); 6 Jun 2017 21:15:21 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 06 Jun 2017 21:15:21 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 6B6A81A08DF for ; Tue, 6 Jun 2017 21:15:21 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -100.002 X-Spam-Level: X-Spam-Status: No, score=-100.002 tagged_above=-999 required=6.31 tests=[RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id 1hIN0IwafkTD for ; Tue, 6 Jun 2017 21:15:20 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTP id 1C4C960FA7 for ; Tue, 6 Jun 2017 21:15:20 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id A7904E0DDA for ; Tue, 6 Jun 2017 21:15:19 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id B6E8721E16 for ; Tue, 6 Jun 2017 21:15:18 +0000 (UTC) Date: Tue, 6 Jun 2017 21:15:18 +0000 (UTC) From: "Wes McKinney (JIRA)" To: issues@spark.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (SPARK-20960) make ColumnVector public MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Tue, 06 Jun 2017 21:15:23 -0000 [ https://issues.apache.org/jira/browse/SPARK-20960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16039648#comment-16039648 ] Wes McKinney commented on SPARK-20960: -------------------------------------- [~cloud_fan] this will be very exciting to have as a supported public API for more efficient UDF execution. We're ready to help with improvements to Arrow (like in-memory encodings / compression a la ARROW-300) to help with these use cases. cc [~jnadeau] [~julienledem] > make ColumnVector public > ------------------------ > > Key: SPARK-20960 > URL: https://issues.apache.org/jira/browse/SPARK-20960 > Project: Spark > Issue Type: New Feature > Components: SQL > Affects Versions: 2.3.0 > Reporter: Wenchen Fan > > ColumnVector is an internal interface in Spark SQL, which is only used for vectorized parquet reader to represent the in-memory columnar format. > In Spark 2.3 we want to make ColumnVector public, so that we can provide a more efficient way for data exchanges between Spark and external systems. For example, we can use ColumnVector to build the columnar read API in data source framework, we can use ColumnVector to build a more efficient UDF API, etc. > We also want to introduce a new ColumnVector implementation based on Apache Arrow(basically just a wrapper over Arrow), so that external systems(like Python Pandas DataFrame) can build ColumnVector very easily. -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org For additional commands, e-mail: issues-help@spark.apache.org