Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id B9415200BD1 for ; Mon, 28 Nov 2016 10:24:02 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id B7ED5160B0D; Mon, 28 Nov 2016 09:24:02 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 0D280160B06 for ; Mon, 28 Nov 2016 10:24:01 +0100 (CET) Received: (qmail 83470 invoked by uid 500); 28 Nov 2016 09:24:01 -0000 Mailing-List: contact issues-help@carbondata.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@carbondata.incubator.apache.org Delivered-To: mailing list issues@carbondata.incubator.apache.org Received: (qmail 83460 invoked by uid 99); 28 Nov 2016 09:24:01 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 28 Nov 2016 09:24:01 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id CE046D58C0 for ; Mon, 28 Nov 2016 09:24:00 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -7.019 X-Spam-Level: X-Spam-Status: No, score=-7.019 tagged_above=-999 required=6.31 tests=[KAM_LAZY_DOMAIN_SECURITY=1, RCVD_IN_DNSWL_HI=-5, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, RP_MATCHES_RCVD=-2.999] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id AcARraohH4Ug for ; Mon, 28 Nov 2016 09:24:00 +0000 (UTC) Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with SMTP id 831F65F39C for ; Mon, 28 Nov 2016 09:23:59 +0000 (UTC) Received: (qmail 83170 invoked by uid 99); 28 Nov 2016 09:23:58 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 28 Nov 2016 09:23:58 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id 7C7282C03E5 for ; Mon, 28 Nov 2016 09:23:58 +0000 (UTC) Date: Mon, 28 Nov 2016 09:23:58 +0000 (UTC) From: "kumar vishal (JIRA)" To: issues@carbondata.incubator.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Created] (CARBONDATA-458) Improving carbon first time query performance MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Mon, 28 Nov 2016 09:24:02 -0000 kumar vishal created CARBONDATA-458: --------------------------------------- Summary: Improving carbon first time query performance Key: CARBONDATA-458 URL: https://issues.apache.org/jira/browse/CARBONDATA-458 Project: CarbonData Issue Type: Improvement Components: core, data-load, data-query Reporter: kumar vishal Assignee: kumar vishal Improving carbon first time query performance Reason: 1. As file system cache is cleared file reading will make it slower to read and cache 2. In first time query carbon will have to read the footer from file data file to form the btree 3. Carbon reading more footer data than its required(data chunk) 4. There are lots of random seek is happening in carbon as column data(data page, rle, inverted index) are not stored together. Solution: 1. Improve block loading time. This can be done by removing data chunk from blockletInfo and storing only offset and length of data chunk 2. compress presence meta bitset stored for null values for measure column using snappy 3. Store the metadata and data of a column together and read together this reduces random seek and improve IO -- This message was sent by Atlassian JIRA (v6.3.4#6332)