Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 5DB1D200BCC for ; Tue, 29 Nov 2016 18:28:17 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 5CD25160B15; Tue, 29 Nov 2016 17:28:17 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id A6C20160AFC for ; Tue, 29 Nov 2016 18:28:16 +0100 (CET) Received: (qmail 47810 invoked by uid 500); 29 Nov 2016 17:28:15 -0000 Mailing-List: contact issues-help@carbondata.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@carbondata.incubator.apache.org Delivered-To: mailing list issues@carbondata.incubator.apache.org Received: (qmail 47801 invoked by uid 99); 29 Nov 2016 17:28:15 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 29 Nov 2016 17:28:15 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 54130D7EED for ; Tue, 29 Nov 2016 17:28:15 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -7.019 X-Spam-Level: X-Spam-Status: No, score=-7.019 tagged_above=-999 required=6.31 tests=[KAM_LAZY_DOMAIN_SECURITY=1, RCVD_IN_DNSWL_HI=-5, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, RP_MATCHES_RCVD=-2.999] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id QTWM2gITSbtV for ; Tue, 29 Nov 2016 17:28:14 +0000 (UTC) Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with SMTP id 3535E5FD66 for ; Tue, 29 Nov 2016 17:28:14 +0000 (UTC) Received: (qmail 46078 invoked by uid 99); 29 Nov 2016 17:26:58 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 29 Nov 2016 17:26:58 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id 623F92C03DF for ; Tue, 29 Nov 2016 17:26:58 +0000 (UTC) Date: Tue, 29 Nov 2016 17:26:58 +0000 (UTC) From: "Ravindra Pesala (JIRA)" To: issues@carbondata.incubator.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Created] (CARBONDATA-470) Add unsafe offheap and on-heap sort in carbodata loading MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Tue, 29 Nov 2016 17:28:17 -0000 Ravindra Pesala created CARBONDATA-470: ------------------------------------------ Summary: Add unsafe offheap and on-heap sort in carbodata loading Key: CARBONDATA-470 URL: https://issues.apache.org/jira/browse/CARBONDATA-470 Project: CarbonData Issue Type: Improvement Reporter: Ravindra Pesala In the current carbondata system loading performance is not so encouraging since we need to sort the data at executor level for data loading. Carbondata collects batch of data and sorts before dumping to the temporary files and finally it does merge sort from those temporary files to finish sorting. Here we face two major issues , one is disk IO and second is GC issue. Even though we dump to the file still carbondata face lot of GC issue since we sort batch data in-memory before dumping to the temporary files. To solve the above problems we can introduce Unsafe Storage and Unsafe sort. Unsafe Storage : User can configure the memory limit to keep the amount of data to in-memory. Here we can keep all the data in continuous memory location either on off-heap or on-heap using Unsafe. Once configure limit exceeds remaining data will be spilled to disk. Unsafe Sort : The data which is store in-memory using Unsafe can be sorted using Unsafe sort. -- This message was sent by Atlassian JIRA (v6.3.4#6332)