From dev-return-45109-archive-asf-public=cust-asf.ponee.io@ignite.apache.org Tue Mar 5 10:09:24 2019 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx-eu-01.ponee.io (Postfix) with SMTP id B2F8E180648 for ; Tue, 5 Mar 2019 11:09:23 +0100 (CET) Received: (qmail 37901 invoked by uid 500); 5 Mar 2019 10:09:16 -0000 Mailing-List: contact dev-help@ignite.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@ignite.apache.org Delivered-To: mailing list dev@ignite.apache.org Received: (qmail 37376 invoked by uid 99); 5 Mar 2019 10:09:16 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 05 Mar 2019 10:09:16 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id AE984C05C6 for ; Tue, 5 Mar 2019 10:09:15 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.798 X-Spam-Level: * X-Spam-Status: No, score=1.798 tagged_above=-999 required=6.31 tests=[DKIMWL_WL_MED=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd1-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id ORK3JDwVT0z9 for ; Tue, 5 Mar 2019 10:09:13 +0000 (UTC) Received: from mail-ua1-f44.google.com (mail-ua1-f44.google.com [209.85.222.44]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id 2DFD55F19A for ; Tue, 5 Mar 2019 10:09:13 +0000 (UTC) Received: by mail-ua1-f44.google.com with SMTP id s15so7230966uap.6 for ; Tue, 05 Mar 2019 02:09:13 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to; bh=B38aQhoP0TfQvJXUv48tlrCJAAj7ZJ69CXzbmZhis5k=; b=esqxsFlyw7KkcXxqxdaEOJOrgjGgex0/UUCQGf8fFjZ6Ka6BZwDg0rs6tfzXq3C+Zc XiUa94IKLrHDCbAXhdbBkO0cDKJydV37/NT08hhXqfupcfeIbM8MGYEj6xT2agGsMkb9 sq+DvA6BPsiJRUg/lWMAg+VMvIEkA8N1pf1RbUEXw143gd5E7ng0hhTK84o+jdGa1AAS HzI4NaHCETT9mHbDi2a8/6Mp7cjX0adwIlJ4w7gPkel78n5cWaQsj+C0qC14BNjF/WNH DwQcY6r6iPv8D7B50B6C1gDAdst0AiuRv38Fe+DRUuY+8Fee9f4Pc6mOg5Hlc2xK/bDA imAQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to; bh=B38aQhoP0TfQvJXUv48tlrCJAAj7ZJ69CXzbmZhis5k=; b=SaeO89c78cQTZ+mJAqesDZY1fWxOaFMo/tV+XFI7AC/gkAHfLtfPsb0QxaV3qNeuqx P5d5OdpgQkDsLRpbenT7aZb8C/0+YCOene14VxQh1pDXr7hlLLI0c2ru0v80Q2s3KeFd AkPnpgWorjK+4V+RuXXmDn767oEK3IaDK1dAYi0bpEr71WuEz2doUKRvJLBo5v/1ozia rxVHoBo2V4DIooU8Y3PFD0f/bWDEJnImj5ytXbRglXH07N8tzEsGdgps1FeSNJ6Vw4wW 5BBR9EGpw59SOzO00bNpvpOXDcN63ooFTCmjhD6CTpI8s0IPMbrN4i1qXycbgkUIrleU WShQ== X-Gm-Message-State: APjAAAVaweMiDLtQ+O2YWL2n294+0jJ/u0zKuEKRztwQJuse3y1QfwDl Ke8ePFlWXhilkHkAft54p5iEj0QT89sarYSWGMhrQg== X-Google-Smtp-Source: APXvYqzTD3Zn2FRXtbqimdX2MFFH+XZOjNBDdpTs0bugRHRH9tf34+yu7TfhSJ+aGDPhXuMOjLOczt1SCKtPI10h/rg= X-Received: by 2002:a67:b605:: with SMTP id d5mr778266vsm.202.1551780546326; Tue, 05 Mar 2019 02:09:06 -0800 (PST) MIME-Version: 1.0 References: In-Reply-To: From: Ilya Kasnacheev Date: Tue, 5 Mar 2019 13:08:55 +0300 Message-ID: Subject: Re: Storing short/empty strings in Ignite To: dev@ignite.apache.org Content-Type: multipart/alternative; boundary="0000000000003bb5a10583561068" --0000000000003bb5a10583561068 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Hello! If you can modify your code to store nulls instead of empty strings, nulls seem to be much more compact. Regards, --=20 Ilya Kasnacheev =D0=B2=D1=82, 5 =D0=BC=D0=B0=D1=80. 2019 =D0=B3. =D0=B2 10:12, Valentin Kul= ichenko < valentin.kulichenko@gmail.com>: > Hey folks, > > While working with Ignite users, I keep seeing data models where a single > object (row) might contain many fields (100, 200, more...), and most of > them are strings. > > Correct me if I'm wrong, but per my understanding, for every such field w= e > store an integer value to represent its length. This is significant > overhead - with 200 fields we spend 800 bytes only for this. > > Now here is the catch: vast majority of those strings are actually empty = or > very short (several chars), therefore we don't really need 4 bytes to the= ir > length. > > My suggestions is to introduce another data type, e.g. STRING_SHORT, use = it > for all strings that are 255 chars or less, and therefore use a single by= te > to encode length. We can go even further, and also introduce STRING_EMPTY= , > which obviously doesn't need any length information at all. > > What do you guys think? > > -Val > --0000000000003bb5a10583561068--