Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 5FF88200B5E for ; Wed, 27 Jul 2016 06:53:26 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 5E91F160AA5; Wed, 27 Jul 2016 04:53:26 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 7F937160AA4 for ; Wed, 27 Jul 2016 06:53:25 +0200 (CEST) Received: (qmail 48447 invoked by uid 500); 27 Jul 2016 04:53:24 -0000 Mailing-List: contact dev-help@ignite.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@ignite.apache.org Delivered-To: mailing list dev@ignite.apache.org Received: (qmail 48434 invoked by uid 99); 27 Jul 2016 04:53:24 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 27 Jul 2016 04:53:24 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id B5535CCB69 for ; Wed, 27 Jul 2016 04:53:23 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -0.089 X-Spam-Level: X-Spam-Status: No, score=-0.089 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H2=-0.001, RP_MATCHES_RCVD=-1.287, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd1-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=hotmail.com Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id OuZzAKltaWxB for ; Wed, 27 Jul 2016 04:53:21 +0000 (UTC) Received: from BLU004-OMC1S4.hotmail.com (blu004-omc1s4.hotmail.com [65.55.116.15]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id 215775F341 for ; Wed, 27 Jul 2016 04:53:19 +0000 (UTC) Received: from NAM02-BL2-obe.outbound.protection.outlook.com ([65.55.116.9]) by BLU004-OMC1S4.hotmail.com over TLS secured channel with Microsoft SMTPSVC(7.5.7601.23008); Tue, 26 Jul 2016 21:53:13 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=hotmail.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version; bh=ZEcXR/fZDGVuL4gYPnhLEsQ2h5fIm294OPxWslbpLto=; b=THsQNRMTcNaqrdBQ6bAMZp967qRGquPpl/RK3QtVg2v0yU2ZRhPGTL49X7Rd9UucQ7aCJ3anPuC5Ib5tiFOrmCtChjyYMREMOLMfnCUXJ92XB0v2DzgWucqxUz6GfhVZxSdGKtNMmVE0c1ZmDv7Yo0fxWN1F+HyvddjcpkbhElnFgt0uB7cpaEracYXgxdL3ToXj+yauvKHhqC/TFkBvS1MFxZUWyTe6H4wTSECbAzQuiQHw5OugboZ+Qx9lLoi0gqyoYRxRcOUCVvRtwV1SJkwVRI/Cj08gtO0RpTvwTWirn9zvqt9xtD7ibRQKcTqJeWsav9lVJRQWTH/ZevEwZw== Received: from CY1NAM02FT021.eop-nam02.prod.protection.outlook.com (10.152.74.57) by CY1NAM02HT256.eop-nam02.prod.protection.outlook.com (10.152.75.44) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384_P384) id 15.1.549.5; Wed, 27 Jul 2016 04:53:11 +0000 Received: from DM2PR0701MB1230.namprd07.prod.outlook.com (10.152.74.51) by CY1NAM02FT021.mail.protection.outlook.com (10.152.75.187) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384_P384) id 15.1.549.5 via Frontend Transport; Wed, 27 Jul 2016 04:53:11 +0000 Received: from DM2PR0701MB1230.namprd07.prod.outlook.com ([10.160.247.144]) by DM2PR0701MB1230.namprd07.prod.outlook.com ([10.160.247.144]) with mapi id 15.01.0549.016; Wed, 27 Jul 2016 04:53:10 +0000 From: Andrey Kornev To: "dev@ignite.apache.org" Subject: Re: Data compression in Ignite 2.0 Thread-Topic: Data compression in Ignite 2.0 Thread-Index: AQHR5j3IhhyZevfX7kSX0KMTvD2yO6AovO2AgAAFzgCAANp3gIAAAkoAgAABPoCAAD63nIAAyLqAgAEQ40Q= Date: Wed, 27 Jul 2016 04:53:10 +0000 Message-ID: References: , In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: authentication-results: spf=softfail (sender IP is 10.152.74.51) smtp.mailfrom=hotmail.com; ignite.apache.org; dkim=none (message not signed) header.d=none;ignite.apache.org; dmarc=fail action=none header.from=hotmail.com; received-spf: SoftFail (protection.outlook.com: domain of transitioning hotmail.com discourages use of 10.152.74.51 as permitted sender) x-ms-exchange-messagesentrepresentingtype: 1 x-tmn: [pnLog1fUFw8d8ur3Gwe9CPQazkM1ALqH] x-eopattributedmessage: 0 x-forefront-antispam-report: CIP:10.152.74.51;IPV:NLI;CTRY:;EFV:NLI;SFV:NSPM;SFS:(10019020)(98900003);DIR:OUT;SFP:1102;SCL:1;SRVR:CY1NAM02HT256;H:DM2PR0701MB1230.namprd07.prod.outlook.com;FPR:;SPF:None;LANG:en; x-microsoft-exchange-diagnostics: 1;CY1NAM02HT256;6:BR6uuYbiRfO5/tcQCNT7V4SCb0hTN3ADkEkYyBz1UsQPM6SQkq+JnRcynTEJqA9Zv9cIfxPxKLW3XEn/F42fZO4FDWEJce9IBP2K/+zHps/LPPSZ+q9P0UwZPyxDzlBjSFTfGqc1fOIP30ib05IEC14hPHlrKXxem4Qkjylz3W56rYKi98832Pk/UpzTbOXH2xRZnWkyaLBikEKpj+6jVRgfjHMMjz5i5GtRionQ8wGPIm6BsOpjh9o57z2f2ZJVLgVfXqGnxGf7xKli4s1T2yKbJ2wu8Rg/xey7jqNfJ/YQTf8GG+iEArENXu3l/oGo;5:xKUaQAdrrG9FDGaT/w9B3xnOcD0/tEkziFCTQ2/tVIk8asXe+Ip/rNF/DIlawyHpJAOMpYLBDTJOS6TGKAaoiY2ylIi5+NVST1hk74AaOFx0TZfDq2lAeCM1pD0Nv4kMohddSe/ZRFxnl4zoDmgBvA==;24:fvcQAYM5ON4b7JOsZiEJWDMQJIoynkoMEYGd4Ae3E4lTieOYWgaG2iJfNXbqkvVVT79lGiSG3reV/62Db/kGWMZcDGCjE4l0kBifRSDxUyE=;7:EIwyCX7DWPyMDqGWU2rhEUU6xqH6if9f9+exczO8oMGhIryDvGq96GtjipkkzE9ruH5y/YlG6OfNq6zA2XX+c9c9FItrcT27qt/9FZn+ErGWkKFnye7V+r0q962sdjHc+Tqn/U5CjwoyYDG8Ksr6ABDubqv/vIWUOo7t6H0F2WKp/WO2JrZfgf0pIPXPJyrBoY+u2zl7HGVvpQYn9FIlXeakk/FAjiUIKGFqFTvbdqTgpOehfImtPH7uBx2FW+sr x-ms-office365-filtering-correlation-id: 78b3eebc-88c3-4d06-4679-08d3b5d9e8a2 x-microsoft-antispam: UriScan:;BCL:0;PCL:0;RULEID:(1601124038)(1601125047);SRVR:CY1NAM02HT256; x-exchange-antispam-report-cfa-test: BCL:0;PCL:0;RULEID:(432015012)(82015046);SRVR:CY1NAM02HT256;BCL:0;PCL:0;RULEID:;SRVR:CY1NAM02HT256; x-forefront-prvs: 0016DEFF96 spamdiagnosticoutput: 1:99 spamdiagnosticmetadata: NSPM Content-Type: multipart/alternative; boundary="_000_DM2PR0701MB1230BB9221156D6F1318CA33CF0F0DM2PR0701MB1230_" MIME-Version: 1.0 X-OriginatorOrg: hotmail.com X-MS-Exchange-CrossTenant-originalarrivaltime: 27 Jul 2016 04:53:10.1058 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Internet X-MS-Exchange-CrossTenant-id: 84df9e7f-e9f6-40af-b435-aaaaaaaaaaaa X-MS-Exchange-Transport-CrossTenantHeadersStamped: CY1NAM02HT256 X-OriginalArrivalTime: 27 Jul 2016 04:53:13.0139 (UTC) FILETIME=[C7C03830:01D1E7C2] archived-at: Wed, 27 Jul 2016 04:53:26 -0000 --_000_DM2PR0701MB1230BB9221156D6F1318CA33CF0F0DM2PR0701MB1230_ Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable Dictionary compression requires some knowledge about data being compressed.= For example, for numeric types a range of values must be known so that the= dictionary can be generated. For strings, the number of unique values of t= he column is the key piece of input into the dictionary generation. SAP HANA is a column-based database system: it stores the fields of the dat= a tuple individually using the best compression for the given data type and= the particular set of values. HANA has been specifically built as a genera= l purpose database, rather than as an afterthought layer on top of an alrea= dy existing distributed cache. On the other hand, Ignite is a distributed cache implementation (a pretty g= ood one!) that in general requires no schema and stores its data in the row= -based fashion. Its current design doesn't land itself readily to the kind = of optimizations HANA provides out of the box. For the curios types among us, the implementation details of HANA are well = documented in "In-memory Data Management", by Hasso Plattner & Alexander Ze= ier. Cheers Andrey _____________________________ From: Alexey Kuznetsov > Sent: Tuesday, July 26, 2016 5:36 AM Subject: Re: Data compression in Ignite 2.0 To: > Sergey Kozlov wrote: >> For approach 1: Put a large object into a partition cache will force to update the dictionary placed on replication cache. It may be time-expense operation. The dictionary will be built only once. And we could control what should be put into dictionary, for example, we could check min and max size and decide - put value to dictionary or not. >> Approach 2-3 are make sense for rare cases as Sergi commented. But it is better at least have a possibility to plug user code for compression than not to have it at all. >> Also I see a danger of OOM if we've got high compression level and try to restore original value in memory. We could easily get OOM with many other operations right now without compression, I think it is not an issue, we could add a NOTE to documentation about such possibility. Andrey Kornev wrote: >> ... in general I think compression is a great data. The cleanest way to achieve that would be to just make it possible to chain the marshallers... I think it is also good idea. And looks like it could be used for compression with some sort of ZIP algorithm, but how to deal with compression by dictionary substitution? We need to build dictionary first. Any ideas? Nikita Ivanov wrote: >> SAP Hana does the compression by 1) compressing SQL parameters before execution... Looks interesting, but my initial point was about compression of cache data, not SQL queries. My idea was to make compression transparent for SQL engine when it will lookup for data. But idea of compressing SQL queries result looks very interesting, because it is known fact, that SQL engine could consume quite a lot of heap for storing result sets. I think this should be discussed in separate thread. Just for you information, in first message I mentioned that DB2 has compression by dictionary and according to them it is possible to compress usual data to 50-80%. I have some experience with DB2 and can confirm this. -- Alexey Kuznetsov --_000_DM2PR0701MB1230BB9221156D6F1318CA33CF0F0DM2PR0701MB1230_--