From user-return-296-archive-asf-public=cust-asf.ponee.io@arrow.apache.org Mon Jan 27 14:55:23 2020 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [207.244.88.153]) by mx-eu-01.ponee.io (Postfix) with SMTP id 4A2D418060E for ; Mon, 27 Jan 2020 15:55:23 +0100 (CET) Received: (qmail 92792 invoked by uid 500); 27 Jan 2020 14:55:22 -0000 Mailing-List: contact user-help@arrow.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@arrow.apache.org Delivered-To: mailing list user@arrow.apache.org Received: (qmail 92782 invoked by uid 99); 27 Jan 2020 14:55:22 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 27 Jan 2020 14:55:21 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 7FCB21A31DB for ; Mon, 27 Jan 2020 14:55:21 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 0.51 X-Spam-Level: X-Spam-Status: No, score=0.51 tagged_above=-999 required=6.31 tests=[KAM_DMARC_STATUS=0.01, KAM_NUMSUBJECT=0.5, SPF_HELO_NONE=0.001, SPF_PASS=-0.001] autolearn=disabled Received: from mx1-ec2-va.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id IZsKKBVKZ8cr for ; Mon, 27 Jan 2020 14:55:20 +0000 (UTC) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=81.89.100.166; helo=mail.athanassios.gr; envelope-from=athanassios@healis.eu; receiver= Received: from mail.athanassios.gr (mail.athanassios.gr [81.89.100.166]) by mx1-ec2-va.apache.org (ASF Mail Server at mx1-ec2-va.apache.org) with ESMTP id 4F5D7BC58D for ; Mon, 27 Jan 2020 14:55:19 +0000 (UTC) Received: from tornado.station (178-110-181.dynamic.cyta.gr [178.59.110.181]) by mail.athanassios.gr (Postfix) with ESMTPSA id DD42C1E1B9E for ; Mon, 27 Jan 2020 15:55:17 +0100 (CET) Message-ID: <25e1916f1ffac83477ddf36358120ae8794e7711.camel@healis.eu> Subject: Indexing, encoding, transformations and processing with PyArrow - GitHub 6284 From: "Athanassios I. Hatzis" To: user@arrow.apache.org Date: Mon, 27 Jan 2020 16:55:15 +0200 Organization: HEALIS.EU Content-Type: text/plain; charset="ISO-8859-15" X-Mailer: Evolution 3.28.5-0ubuntu0.18.04.1 Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Hi, recently I have started experimenting with PyArrow for the needs of my TRIADB project. Kudos to Wes and his team on leading one of the best open-source IT projects in data engineering. Definitely a wise decision to continue the success story of Pandas on the right track ! At this stage I am trying to make a new release of TRIADB that will handle metadata management and fast ingestion of data in memory for transformations and basic query operations. Secondary index, dictionary encoding and adjacency lists are a core part of TRIADB project, that is the reason I posted the issue with Array.dictionary_encode method ( https://github.com/apache/arrow/issues/6284). Isn't my example and description clear ? What exactly would you like me to elaborate on ? I also noticed that there is NumPy integration and you can convert easily from NumPy to Arrow but the reverse direction has several limitations. For example I cannot create view for StringArray (NotImplementedError: NumPy array view is only supported for primitive types). But string() (utf8) is in the list of your primitive types. Any plans for supporting this type with NumPy soon ? Kind regards Athanassios