Return-Path: X-Original-To: apmail-ignite-dev-archive@minotaur.apache.org Delivered-To: apmail-ignite-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id AB59618709 for ; Wed, 22 Jul 2015 10:14:28 +0000 (UTC) Received: (qmail 74183 invoked by uid 500); 22 Jul 2015 10:14:28 -0000 Delivered-To: apmail-ignite-dev-archive@ignite.apache.org Received: (qmail 74140 invoked by uid 500); 22 Jul 2015 10:14:28 -0000 Mailing-List: contact dev-help@ignite.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@ignite.incubator.apache.org Delivered-To: mailing list dev@ignite.incubator.apache.org Received: (qmail 74128 invoked by uid 99); 22 Jul 2015 10:14:28 -0000 Received: from Unknown (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 22 Jul 2015 10:14:28 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id BAA69D6A5F for ; Wed, 22 Jul 2015 10:14:27 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.879 X-Spam-Level: ** X-Spam-Status: No, score=2.879 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=3, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd1-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-us-east.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id H56_reV_wbVz for ; Wed, 22 Jul 2015 10:14:26 +0000 (UTC) Received: from mail-wi0-f175.google.com (mail-wi0-f175.google.com [209.85.212.175]) by mx1-us-east.apache.org (ASF Mail Server at mx1-us-east.apache.org) with ESMTPS id F16CB428E7 for ; Wed, 22 Jul 2015 10:14:25 +0000 (UTC) Received: by wicmv11 with SMTP id mv11so74644468wic.0 for ; Wed, 22 Jul 2015 03:14:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=Gan+RL8nsxzB0+tf95G0Hvq4TDFX4v77SqwRFDCMIpA=; b=Rb81j6SAgnZf+2YAfErLN0TkyVYwHbQRfwDs4EVv+GxPSUZsLsFbLVFRu8y0HccYrv zkE8JhKu5i6u7J7cXOb7om5tr7UuUW+3raSeS/Ppid3uBwrxZitmDKIU5L6OeH8F56Y7 +igGdPRocig5g9RJCYlzLCLXP0c6DoP74W10qXmTHnB/wxnyFcL1h9Y4B3LojVFwhjnC 8Z/f6xogo7pCv09Sb4N5lg6xvgbU/4MA8X3hiJF8cWZMoaDf+bchbSxSKRQ7hzykbGZR LZYjgkxuEysnvVMll2hhtRkAcN9moZPB6yDUV73qSDcZQOXeinoC/p3YZSyBNS/sTXka gDqw== MIME-Version: 1.0 X-Received: by 10.180.83.72 with SMTP id o8mr40872572wiy.27.1437560064861; Wed, 22 Jul 2015 03:14:24 -0700 (PDT) Received: by 10.194.56.199 with HTTP; Wed, 22 Jul 2015 03:14:24 -0700 (PDT) In-Reply-To: References: Date: Wed, 22 Jul 2015 15:44:24 +0530 Message-ID: Subject: Re: Unstructured object format. From: Atri Sharma To: dev@ignite.incubator.apache.org Content-Type: multipart/alternative; boundary=f46d0442887e026d64051b74093b --f46d0442887e026d64051b74093b Content-Type: text/plain; charset=UTF-8 +1 as well. I wonder if it makes sense to further enhance the header fields to have more metadata readily available (so that we can support more JSONB like operators in a much efficient manner). On Wed, Jul 22, 2015 at 3:37 PM, Atri Sharma wrote: > +1 from me too > > On Wed, Jul 22, 2015 at 3:34 PM, Sergi Vladykin > wrote: > >> Ok, looks good to me. >> >> Sergi >> >> 2015-07-22 10:46 GMT+03:00 Dmitriy Setrakyan : >> >> > On Tue, Jul 21, 2015 at 10:15 PM, Atri Sharma >> wrote: >> > >> > > So does that mean that local hashmap is not controlled with all the >> heavy >> > > locks that are present around the cache? >> > > >> > >> > Yes Atri, you are right. The data is stored in local hash map to avoid >> > touching a distributed cache whenever serializing objects. >> > >> > >> > > On 22 Jul 2015 07:31, "Alexey Goncharuk" >> > > wrote: >> > > >> > > > Metadata cache access is backed by a local hash map, so the real >> cost >> > is >> > > a >> > > > String object hashcode which is cached in the String object and a >> > hashmap >> > > > lookup by an integer key. >> > > > >> > > > On the other hand, the marshaller is still pluggable and after the >> > ticket >> > > > is completed, it should be fairly easy to implement this approach >> and >> > > > compare performance. >> > > > >> > > > --Alexey >> > > > >> > > > 2015-07-21 10:01 GMT-07:00 Sergi Vladykin > >: >> > > > >> > > > > I think O(N) reasoning does not make a real sense here since N is >> > > always >> > > > > small, lets not fool ourselves. >> > > > > To my mind operation cost of cache access (with all busy >> locks...), >> > > > > hashCode/equals and stuff like that has much bigger impact here. >> > > > > Do we still have a pluggable marshaller? Can my approach be >> > implemented >> > > > > separately? >> > > > > >> > > > > Sergi >> > > > > >> > > > > >> > > > > >> > > > > >> > > > > 2015-07-21 9:14 GMT+03:00 Alexey Goncharuk < >> > alexey.goncharuk@gmail.com >> > > >: >> > > > > >> > > > > > Currently an index-enabled serialized object form has the >> following >> > > > > layout >> > > > > > (simplified): >> > > > > > >> > > > > > [object fields][field1Offset,field1Length, >> > > > > > field2Offset,field2Length,...,fieldNOffset,fieldNLength] >> > > > > > >> > > > > > where fields order is determined upon the first object >> > serialization >> > > > and >> > > > > > stored in metadata cache which is available on all nodes. Thus, >> the >> > > > field >> > > > > > lookup is performed as follows: >> > > > > > >> > > > > > fieldName -> fieldIndex (metadata lookup, O(1)), >> > > > fieldIndex->fieldOffset >> > > > > in >> > > > > > footer (O(1)), fieldOffset->fieldValue (O(1)). >> > > > > > >> > > > > > BTW, I am finalizing the branch with marshaller changes and will >> > send >> > > > > this >> > > > > > for a preliminary review soon. >> > > > > > >> > > > > > 2015-07-16 0:55 GMT-07:00 Atri Sharma : >> > > > > > >> > > > > > > Keep in mind that JSONB's performance comes from the fact >> that it >> > > > uses >> > > > > > > server encoding, is binary represented and can have GIN >> indexes >> > on >> > > > top >> > > > > of >> > > > > > > it. Not sure if Ignite's marshalling approach keeps those >> > features >> > > as >> > > > > > well. >> > > > > > > >> > > > > > > On Thu, Jul 16, 2015 at 1:20 PM, Sergi Vladykin < >> > > > > > sergi.vladykin@gmail.com> >> > > > > > > wrote: >> > > > > > > >> > > > > > > > HSTORE and JSONB appeared to have similar format in >> Postgresql >> > > > > (because >> > > > > > > > they was developed by the same people). They noticed that >> they >> > > > > switched >> > > > > > > off >> > > > > > > > of using key length sorting because they sometimes need >> > > > > lexicographical >> > > > > > > > order but this is irrelevant for us. >> > > > > > > > >> > > > > > > > Sergi >> > > > > > > > >> > > > > > > > 2015-07-16 10:43 GMT+03:00 Atri Sharma > >: >> > > > > > > > >> > > > > > > > > Are you referring to JSONB here? >> > > > > > > > > >> > > > > > > > > On Thu, Jul 16, 2015 at 1:10 PM, Sergi Vladykin < >> > > > > > > > sergi.vladykin@gmail.com> >> >> > > > > > > > > wrote: >> > > > > > > > > >> > > > > > > > > > Guys, specially Alexey G. >> > > > > > > > > > >> > > > > > > > > > I've attended PostgreSQL conference and there was a talk >> > > about >> > > > > > > > > unstructured >> > > > > > > > > > data format. >> > > > > > > > > > They had an interesting idea of serialized layout close >> > > enough >> > > > to >> > > > > > > ours, >> > > > > > > > > I'm >> > > > > > > > > > not sure how much this is different from our approach >> and >> > if >> > > we >> > > > > can >> > > > > > > use >> > > > > > > > > > some ideas from it but anywaus it looks really >> promising to >> > > me >> > > > > and >> > > > > > I >> > > > > > > > want >> > > > > > > > > > to share. >> > > > > > > > > > >> > > > > > > > > > The structure basically is the following: >> > > > > > > > > > >> > > > > > > > > > [key headers] [keys] [values] >> > > > > > > > > > >> > > > > > > > > > Key headers are [key offset, key length] so they are of >> a >> > > fixed >> > > > > > > length. >> > > > > > > > > > >> > > > > > > > > > The cool idea here is that keys and respectively the key >> > > > headers >> > > > > > > sorted >> > > > > > > > > by >> > > > > > > > > > (key length, key) so that you can do a lookup first by >> fast >> > > > > picking >> > > > > > > key >> > > > > > > > > of >> > > > > > > > > > the needed length without looking at keys at all and >> then >> > > pick >> > > > an >> > > > > > > exact >> > > > > > > > > > key. Both searches can be done with fast scan if there >> are >> > > > small >> > > > > > > number >> > > > > > > > > of >> > > > > > > > > > keys and binary search for a larger number of keys. >> > > > > > > > > > >> > > > > > > > > > Alexey G., could you please compare this to our new >> > > marshalling >> > > > > > > > approach >> > > > > > > > > > you are about to merge? >> > > > > > > > > > BTW, it would be nice if you will describe it in details >> > > here. >> > > > > > > > > > >> > > > > > > > > > Sergi >> > > > > > > > > > >> > > > > > > > > >> > > > > > > > > >> > > > > > > > > >> > > > > > > > > -- >> > > > > > > > > Regards, >> > > > > > > > > >> > > > > > > > > Atri >> > > > > > > > > *l'apprenant* >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > -- >> > > > > > > Regards, >> > > > > > > >> > > > > > > Atri >> > > > > > > *l'apprenant* >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > >> > > > > -- > Regards, > > Atri > *l'apprenant* > -- Regards, Atri *l'apprenant* --f46d0442887e026d64051b74093b--