Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id F08EA200C16 for ; Thu, 9 Feb 2017 23:03:46 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id EF067160B50; Thu, 9 Feb 2017 22:03:46 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 427C7160B4B for ; Thu, 9 Feb 2017 23:03:46 +0100 (CET) Received: (qmail 22311 invoked by uid 500); 9 Feb 2017 22:03:45 -0000 Mailing-List: contact dev-help@arrow.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@arrow.apache.org Delivered-To: mailing list dev@arrow.apache.org Received: (qmail 22300 invoked by uid 99); 9 Feb 2017 22:03:45 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 09 Feb 2017 22:03:45 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id DDB60C028C for ; Thu, 9 Feb 2017 22:03:44 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -1.198 X-Spam-Level: X-Spam-Status: No, score=-1.198 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, KAM_LAZY_DOMAIN_SECURITY=1, RP_MATCHES_RCVD=-2.999, URIBL_BLOCKED=0.001] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id 62wdiz8jPklD for ; Thu, 9 Feb 2017 22:03:44 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTP id 9CCE65F39F for ; Thu, 9 Feb 2017 22:03:43 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id EA7C7E0416 for ; Thu, 9 Feb 2017 22:03:41 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id A7ABB21D66 for ; Thu, 9 Feb 2017 22:03:41 +0000 (UTC) Date: Thu, 9 Feb 2017 22:03:41 +0000 (UTC) From: "Emilio Lahr-Vivaz (JIRA)" To: dev@arrow.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (ARROW-542) [Java] Implement dictionaries in stream/file encoding MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Thu, 09 Feb 2017 22:03:47 -0000 [ https://issues.apache.org/jira/browse/ARROW-542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15860292#comment-15860292 ] Emilio Lahr-Vivaz commented on ARROW-542: ----------------------------------------- Another blocker I'm hitting is that I don't see any way that the type of a dictionary block can be determined during read. DictionaryEncoding has an indexType, but that seems to refer to the ints used to reference the dictionary values: https://github.com/apache/arrow/blob/b99d049c3d1894908b7e52774eb657675dc1f439/format/Message.fbs#L165 A dictionary encoded vector currently has it's type defined as the dictionary index type, but the type of the dictionary is not defined. It works when the data is in memory with the dictionary alongside it, but not when encoding to the file format... Possibly the dictionary encoded vector should specify the dictionary type? It seems like either that or the message format needs another field for the dictionary type. > [Java] Implement dictionaries in stream/file encoding > ----------------------------------------------------- > > Key: ARROW-542 > URL: https://issues.apache.org/jira/browse/ARROW-542 > Project: Apache Arrow > Issue Type: Improvement > Components: Java - Vectors > Reporter: Emilio Lahr-Vivaz > Assignee: Emilio Lahr-Vivaz > -- This message was sent by Atlassian JIRA (v6.3.15#6346)