Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id E8099200CD0 for ; Tue, 25 Jul 2017 19:37:18 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id E691C1609D3; Tue, 25 Jul 2017 17:37:18 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 10FD91609D2 for ; Tue, 25 Jul 2017 19:37:17 +0200 (CEST) Received: (qmail 68710 invoked by uid 500); 25 Jul 2017 17:37:17 -0000 Mailing-List: contact dev-help@ignite.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@ignite.apache.org Delivered-To: mailing list dev@ignite.apache.org Received: (qmail 68698 invoked by uid 99); 25 Jul 2017 17:37:16 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 25 Jul 2017 17:37:16 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id 71306C02A9 for ; Tue, 25 Jul 2017 17:37:16 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 0.512 X-Spam-Level: X-Spam-Status: No, score=0.512 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-2.8, SPF_PASS=-0.001, URI_HEX=1.313] autolearn=disabled Authentication-Results: spamd4-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gridgain-com.20150623.gappssmtp.com Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id 9Y20N4EtrBix for ; Tue, 25 Jul 2017 17:37:14 +0000 (UTC) Received: from mail-vk0-f43.google.com (mail-vk0-f43.google.com [209.85.213.43]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id E12F55FC43 for ; Tue, 25 Jul 2017 17:37:13 +0000 (UTC) Received: by mail-vk0-f43.google.com with SMTP id f68so11269562vkg.2 for ; Tue, 25 Jul 2017 10:37:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gridgain-com.20150623.gappssmtp.com; s=20150623; h=mime-version:references:in-reply-to:from:date:message-id:subject:to; bh=O2ZXheU+K7kf0o18pO0NH02Y8s11nQSkKE9K0eMstKM=; b=jgkZw99Zdk1IStXHPf8tOWrsvVv/frw8f1b6z/d0tGTJPvz3X9TRElKF9GhWban/VB JOHuqWpeioKxdAqNzHk8qUZ9BIQMGRTABq40XyxZZLeCSdP74Sx9dCAgaqtrhhQ3ooQi bAahPVlLZ1aZlE/ePmCYu7ux/qSg9i6DfC6Jna2TgHD+iL/bXaLPXZuc39eOOJGQrYA7 wp/t9gdkgEeOPlwoYjkjWF97BxbXk4d1hxPbgf9wGDFZVkftFHj4TMOGlDQC8o+roVJd eSC4F01DWHnU6qoxb/ysdON0b8K+L8mmBLUltRBvyP+/YiviVbAZZheyp4sjItsbWJJ0 89mw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to; bh=O2ZXheU+K7kf0o18pO0NH02Y8s11nQSkKE9K0eMstKM=; b=RRdPb+rkZhZrKALLAHlDYFenvfmaCIoemOLrGu7WfTLY9Vxg/2Z6UeN3hrR4bRg0Q/ imoRrJKD11Ey3xq7zWN3l+/oJ4+hBIio/TdzZRABnPqVa9TfuPQRyQiD3LaKepKFLAzT 3jFFNVDE7CIfR2DLJtKO98cMr6uHi779ScUeaxhzsc/lWMqNArGEoB75PbqpMB5FLBq8 1+bKnYPPZTy4BXQUyVWz2ziiFgHu6oAr7/FB3m7bl9oLPGqC3v6qyzai5wumRdI+IdlC e6d1UxGBGwz/R0iPvqQsblflra+S93h75j1jBUqlmmpJYMgV8Oo8cpPzq7HQWf/aKRme YYzA== X-Gm-Message-State: AIVw111T7s+RXmUaR/kTykC2o438f2BlgKThsomuscnjqogQirEiNZhq T3dGH87Rp/c6D1v3YSaDTwwnimxYOmvI X-Received: by 10.31.212.5 with SMTP id l5mr12270877vkg.73.1501004226807; Tue, 25 Jul 2017 10:37:06 -0700 (PDT) MIME-Version: 1.0 References: <1500988415143-20024.post@n4.nabble.com> In-Reply-To: From: Vladimir Ozerov Date: Tue, 25 Jul 2017 17:36:56 +0000 Message-ID: Subject: Re: Non-UTF-8 string encoding support in BinaryMarshaller (IGNITE-5655) To: dev@ignite.apache.org Content-Type: multipart/alternative; boundary="001a114eea8ebee446055527c7bf" archived-at: Tue, 25 Jul 2017 17:37:19 -0000 --001a114eea8ebee446055527c7bf Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Vyacheslav, When we finish varlen optimization for string lengths, I am afraid we could end up with very messy protocol, should we mix encoded length and encoding. Dima, Encoding must be set on per field basis. This will give us as most flexible solution at the cost of 1-byte overhead. =D0=B2=D1=82, 25 =D0=B8=D1=8E=D0=BB=D1=8F 2017 =D0=B3. =D0=B2 20:23, Dmitri= y Setrakyan : > I don't understand why this encoding is done on per-object and not on > per-cache level. Shouldn't the column-to-encoding mapping be defined at > cache level configuration? > > On Tue, Jul 25, 2017 at 12:13 PM, Vladimir Ozerov > wrote: > > > Andrey, > > > > You cannot have optional part in the middle as it will break > compatibility > > in dangerous way, probably leading to node crash. Also having INT (4 > bytes) > > looks too much for me. > > > > Instead, I would add new type "encoded string": > > 1 byte - type > > 1 byte - encoding code, map frequently used encodings to some byte valu= e; > > also have a special value, meaning that encoding will be written as > string > > afterwards, this way we will support any encoding out of the box > > [optional] encoding name > > 4 bytes - string length > > Finally - string bytes > > > > Vladimir. > > > > =D0=B2=D1=82, 25 =D0=B8=D1=8E=D0=BB=D1=8F 2017 =D0=B3. =D0=B2 18:24, An= drey Kuznetsov : > > > > > I apologize for damaged formatting. Below is my message as it should > be. > > > > > > > > > Hi Igniters, > > > > > > I'd like to discuss future changes related to > https://issues.apache.org/ > > > jira/browse/IGNITE-5655 > > > . > > > > > > Is it really good idea to introduce new flag (ENCODED_STRING) for > > existing > > > String datatype? It's possible to use existing STRING flag at > negligible > > > performance cost. > > > > > > Currently, utf-8-encoded string looks like > > > > > > byteFlag nonNegativeIntStrLen bytes > > > > > > This format can be backward compatibly extended to > > > > > > byteFlag [negativeIntCharsetCode] nonNegativeIntStrLen bytes > > > > > > Next, I suggest to add new BinaryConfiguration property for encoding = to > > use > > > instead of using global property. It seems to be more convenient for > > user. > > > > > > I'll appreciate your feedback. > > > > > > 2017-07-25 16:13 GMT+03:00 Andrey Kuznetsov : > > > > > > > Hi Igniters,I'd like to discuss future changes related to > IGNITE-5655 > > > > . Is it really > > good > > > > idea to introduce new flag (ENCODED_STRING) for existing String > > datatype? > > > > It's possible to use existing STRING flag at negligible performance > > cost. > > > > Currently, utf-8-encoded string looks like > > > > byteFlag nonNegativeIntStrLen bytes > > > > This format can be backward compatibly extended to > > > > byteFlag [negativeIntCharsetCode] nonNegativeIntStrLen bytes > > > > Next, I suggest to add new BinaryConfiguration property for encodin= g > to > > > use > > > > instead of using global property. It seems to be more convenient fo= r > > > > user.I'll appreciate your feedback. > > > > > > > > > > > > > > > > ----- > > > > Best regards, > > > > Andrey Kuznetsov. > > > > -- > > > > View this message in context: http://apache-ignite- > > > > developers.2346864.n4.nabble.com/Non-UTF-8-string-encoding- > > > > support-in-BinaryMarshaller-IGNITE-5655-tp20024.html > > > > Sent from the Apache Ignite Developers mailing list archive at > > > Nabble.com. > > > > > > > > > > > > > > > -- > > > Best regards, > > > Andrey Kuznetsov. > > > > > > --001a114eea8ebee446055527c7bf--