Mailing-List: contact dev-help@ignite.incubator.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@ignite.incubator.apache.org
MIME-Version: 1.0
In-Reply-To: 
 <CAOeZVidNoYO0gfZ+0OEvUXQnfWwUE+nzE51o16nuQSQODLd=dw@mail.gmail.com>
References: 
 <CA+eZwrFsa9mSNV3YYCsX8CP4MdQ7yAjrjYQitE=MCt7jLVstSg@mail.gmail.com>
	<CAOeZVieWc-SMJ4d1ET7c9Wd3kwnPvz1AH6rJCpNU+mfvykeyaQ@mail.gmail.com>
	<CA+eZwrFM7a3eKan9scrKS8mETv0ZHFY_=4YVtY8F2xFRPWAH6A@mail.gmail.com>
	<CAOeZVid8rX7RnX-hsFU7E2PqsqFzGMfNjKe-vYTqQik2qhb80A@mail.gmail.com>
	<CABDss3hMh5bva5n4W_CuWK-N45uebnJM_KA8fnzEU=gXkEvU_g@mail.gmail.com>
	<CA+eZwrHwMNVR_WqF=pWJ+YHOhgaN5LrPGBcKR=5vp1MBDqc8Pw@mail.gmail.com>
	<CABDss3j1PWFXiQNUkiST8hsswX_aus9cK1D_eD+ARVSBcqktiA@mail.gmail.com>
	<CAOeZVicox7OzsWw0tNDvG=9aSJgs_Nuh+RRCJMN4R8edQBN_9A@mail.gmail.com>
	<CA+0=VoV3fs_ei3b1EgmbBvm67bAxPSJRDPPp1ZR=qYtRra3BgQ@mail.gmail.com>
	<CA+eZwrFMdEN53jyR57vrUtiMvhT3ELp7PSL1A4ogxgoDB+OPMA@mail.gmail.com>
	<CAOeZVidNoYO0gfZ+0OEvUXQnfWwUE+nzE51o16nuQSQODLd=dw@mail.gmail.com>
Date: Wed, 22 Jul 2015 15:44:24 +0530
Message-ID: 
 <CAOeZVidk4tGBvOfk6tXkykhEKO8TVe-UOH+qBHLfvdw+0CgKYQ@mail.gmail.com>
Subject: Re: Unstructured object format.
From: Atri Sharma <atri.jiit@gmail.com>
To: dev@ignite.incubator.apache.org
Content-Type: multipart/alternative; boundary=f46d0442887e026d64051b74093b

--f46d0442887e026d64051b74093b
Content-Type: text/plain; charset=UTF-8

+1 as well.

I wonder if it makes sense to further enhance the header fields to have
more metadata readily available (so that we can support more JSONB like
operators in a much efficient manner).

On Wed, Jul 22, 2015 at 3:37 PM, Atri Sharma <atri.jiit@gmail.com> wrote:

> +1 from me too
>
> On Wed, Jul 22, 2015 at 3:34 PM, Sergi Vladykin <sergi.vladykin@gmail.com>
> wrote:
>
>> Ok, looks good to me.
>>
>> Sergi
>>
>> 2015-07-22 10:46 GMT+03:00 Dmitriy Setrakyan <dsetrakyan@apache.org>:
>>
>> > On Tue, Jul 21, 2015 at 10:15 PM, Atri Sharma <atri.jiit@gmail.com>
>> wrote:
>> >
>> > > So does that mean that local hashmap is not controlled with all the
>> heavy
>> > > locks that are present around the cache?
>> > >
>> >
>> > Yes Atri, you are right. The data is stored in local hash map to avoid
>> > touching a distributed cache whenever serializing objects.
>> >
>> >
>> > > On 22 Jul 2015 07:31, "Alexey Goncharuk" <alexey.goncharuk@gmail.com>
>> > > wrote:
>> > >
>> > > > Metadata cache access is backed by a local hash map, so the real
>> cost
>> > is
>> > > a
>> > > > String object hashcode which is cached in the String object and a
>> > hashmap
>> > > > lookup by an integer key.
>> > > >
>> > > > On the other hand, the marshaller is still pluggable and after the
>> > ticket
>> > > > is completed, it should be fairly easy to implement this approach
>> and
>> > > > compare performance.
>> > > >
>> > > > --Alexey
>> > > >
>> > > > 2015-07-21 10:01 GMT-07:00 Sergi Vladykin <sergi.vladykin@gmail.com
>> >:
>> > > >
>> > > > > I think O(N) reasoning does not make a real sense here since N is
>> > > always
>> > > > > small, lets not fool ourselves.
>> > > > > To my mind operation cost of cache access (with all busy
>> locks...),
>> > > > > hashCode/equals and stuff like that has much bigger impact here.
>> > > > > Do we still have a pluggable marshaller? Can my approach be
>> > implemented
>> > > > > separately?
>> > > > >
>> > > > > Sergi
>> > > > >
>> > > > >
>> > > > >
>> > > > >
>> > > > > 2015-07-21 9:14 GMT+03:00 Alexey Goncharuk <
>> > alexey.goncharuk@gmail.com
>> > > >:
>> > > > >
>> > > > > > Currently an index-enabled serialized object form has the
>> following
>> > > > > layout
>> > > > > > (simplified):
>> > > > > >
>> > > > > > [object fields][field1Offset,field1Length,
>> > > > > > field2Offset,field2Length,...,fieldNOffset,fieldNLength]
>> > > > > >
>> > > > > > where fields order is determined upon the first object
>> > serialization
>> > > > and
>> > > > > > stored in metadata cache which is available on all nodes. Thus,
>> the
>> > > > field
>> > > > > > lookup is performed as follows:
>> > > > > >
>> > > > > > fieldName -> fieldIndex (metadata lookup, O(1)),
>> > > > fieldIndex->fieldOffset
>> > > > > in
>> > > > > > footer (O(1)), fieldOffset->fieldValue (O(1)).
>> > > > > >
>> > > > > > BTW, I am finalizing the branch with marshaller changes and will
>> > send
>> > > > > this
>> > > > > > for a preliminary review soon.
>> > > > > >
>> > > > > > 2015-07-16 0:55 GMT-07:00 Atri Sharma <atri.jiit@gmail.com>:
>> > > > > >
>> > > > > > > Keep in mind that JSONB's performance comes from the fact
>> that it
>> > > > uses
>> > > > > > > server encoding, is binary represented and can have GIN
>> indexes
>> > on
>> > > > top
>> > > > > of
>> > > > > > > it. Not sure if Ignite's marshalling approach keeps those
>> > features
>> > > as
>> > > > > > well.
>> > > > > > >
>> > > > > > > On Thu, Jul 16, 2015 at 1:20 PM, Sergi Vladykin <
>> > > > > > sergi.vladykin@gmail.com>
>> > > > > > > wrote:
>> > > > > > >
>> > > > > > > > HSTORE and JSONB appeared to have similar format in
>> Postgresql
>> > > > > (because
>> > > > > > > > they was developed by the same people). They noticed that
>> they
>> > > > > switched
>> > > > > > > off
>> > > > > > > > of using key length sorting because they sometimes need
>> > > > > lexicographical
>> > > > > > > > order but this is irrelevant for us.
>> > > > > > > >
>> > > > > > > > Sergi
>> > > > > > > >
>> > > > > > > > 2015-07-16 10:43 GMT+03:00 Atri Sharma <atri.jiit@gmail.com
>> >:
>> > > > > > > >
>> > > > > > > > > Are you referring to JSONB here?
>> > > > > > > > >
>> > > > > > > > > On Thu, Jul 16, 2015 at 1:10 PM, Sergi Vladykin <
>> > > > > > > > sergi.vladykin@gmail.com>
>>
>> > > > > > > > > wrote:
>> > > > > > > > >
>> > > > > > > > > > Guys, specially Alexey G.
>> > > > > > > > > >
>> > > > > > > > > > I've attended PostgreSQL conference and there was a talk
>> > > about
>> > > > > > > > > unstructured
>> > > > > > > > > > data format.
>> > > > > > > > > > They had an interesting idea of serialized layout close
>> > > enough
>> > > > to
>> > > > > > > ours,
>> > > > > > > > > I'm
>> > > > > > > > > > not sure how much this is different from our approach
>> and
>> > if
>> > > we
>> > > > > can
>> > > > > > > use
>> > > > > > > > > > some ideas from it but anywaus it looks really
>> promising to
>> > > me
>> > > > > and
>> > > > > > I
>> > > > > > > > want
>> > > > > > > > > > to share.
>> > > > > > > > > >
>> > > > > > > > > > The structure basically is the following:
>> > > > > > > > > >
>> > > > > > > > > > [key headers] [keys] [values]
>> > > > > > > > > >
>> > > > > > > > > > Key headers are [key offset, key length] so they are of
>> a
>> > > fixed
>> > > > > > > length.
>> > > > > > > > > >
>> > > > > > > > > > The cool idea here is that keys and respectively the key
>> > > > headers
>> > > > > > > sorted
>> > > > > > > > > by
>> > > > > > > > > > (key length, key) so that you can do a lookup first by
>> fast
>> > > > > picking
>> > > > > > > key
>> > > > > > > > > of
>> > > > > > > > > > the needed length without looking at keys at all and
>> then
>> > > pick
>> > > > an
>> > > > > > > exact
>> > > > > > > > > > key. Both searches can be done with fast scan if there
>> are
>> > > > small
>> > > > > > > number
>> > > > > > > > > of
>> > > > > > > > > > keys and binary search for a larger number of keys.
>> > > > > > > > > >
>> > > > > > > > > > Alexey G., could you please compare this to our new
>> > > marshalling
>> > > > > > > > approach
>> > > > > > > > > > you are about to merge?
>> > > > > > > > > > BTW, it would be nice if you will describe it in details
>> > > here.
>> > > > > > > > > >
>> > > > > > > > > > Sergi
>> > > > > > > > > >
>> > > > > > > > >
>> > > > > > > > >
>> > > > > > > > >
>> > > > > > > > > --
>> > > > > > > > > Regards,
>> > > > > > > > >
>> > > > > > > > > Atri
>> > > > > > > > > *l'apprenant*
>> > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > > >
>> > > > > > >
>> > > > > > > --
>> > > > > > > Regards,
>> > > > > > >
>> > > > > > > Atri
>> > > > > > > *l'apprenant*
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>>
>
>
>
> --
> Regards,
>
> Atri
> *l'apprenant*
>


-- 
Regards,

Atri
*l'apprenant*

--f46d0442887e026d64051b74093b--