ignite-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Raymond Wilson <raymond_wil...@trimble.com>
Subject RE: Accessing array elements within items cached in Ignite without deserialising the entire item
Date Thu, 31 Aug 2017 01:35:46 GMT
I agree on correctness being the first priority always J



Is there documentation on when that copy is made? For instance, is a copy
made for every invocation of a BinarySerializer, or are there some
additional caching semantics that mean the copy is made once and stays
around for a while so subsequent invocations don’t have additional overhead
recopying the cached item from the unmanaged context?



The context here would be  a cache item with potentially significant
internal structure where you might want to randomly access items within
that structure without deserialising the entire cached item.



*From:* Pavel Tupitsyn [mailto:ptupitsyn@apache.org]
*Sent:* Friday, August 4, 2017 10:10 PM
*To:* user@ignite.apache.org
*Subject:* Re: Accessing array elements within items cached in Ignite
without deserialising the entire item



>  git refused to clone the repo in GitExtensions

"git clone https://github.com/apache/ignite.git" in console should work



> Is the ‘data’ pointer actually a pointer to the unmanaged memory in the
off heap cache containing this element

It is just a pointer, it can point both to managed or unmanaged memory.

Yes, in some cases we have a pointer to unmanaged memory that comes from
Java side, but it is always a copy of the actual cache data.

Otherwise it would be quite difficult to maintain atomicity and the like.



Generally, I don't think we should introduce pointers and other unsafe
stuff in the public API.

Performance is important, but correctness is always a priority.



On Fri, Aug 4, 2017 at 5:25 AM, Raymond Wilson <raymond_wilson@trimble.com>
wrote:

I had not seen that page yet – very useful.



There’s a few moving parts to getting it working, so not sure I will get
time to really dig into, but will have a look for sure.



I did pull a static copy of the source (after git refused to clone the repo
in GitExtensions) and started looking at the code. It does seem relatively
simple to add appropriate methods to the appropriate interface and
implementation classes.



Question: When I see a method like this in BinaryStreamBase.cs:



        /// <summary>

        /// Read byte array.

        /// </summary>

        /// <param name="cnt">Count.</param>

        /// <returns>

        /// Byte array.

        /// </returns>

        public abstract byte[] ReadByteArray(int cnt);



        protected static byte[] ReadByteArray0(int len, byte* data)

        {

            byte[] res = new byte[len];



            fixed (byte* res0 = res)

            {

                CopyMemory(data, res0, len);

            }



            return res;

        }



Is the ‘data’ pointer actually a pointer to the unmanaged memory in the off
heap cache containing this element? If so, would this permit ‘user defined’
operations to performed, something like this? [or does Ignite.Net Linq
already support this]?



        /// <summary>

        /// Perform action on a range of byte array elements with a delegate

        /// </summary>

        /// <param name="index">Start at.</param>

        /// <param name="cnt">Count.</param>

        /// <returns>

        /// Nothing

        /// </returns>

        protected static void PerformByteArrayOperation(int index, int len,
Action<byte> action, byte* data)

        {

            fixed (byte* res0 = &res[index])

            {

                for (int i = 0; I < len; i++)

                {

                     action(res0++);

                }

            }

        }



There’s probably a nice way to genericize this across multiple array types,
but it’s useful as an example.



In this way you can operate on the data without the need to move it around
all the time between unmanaged and managed contexts.



Thanks,

Raymond.



*From:* Pavel Tupitsyn [mailto:ptupitsyn@apache.org]
*Sent:* Thursday, August 3, 2017 7:21 PM


*To:* user@ignite.apache.org
*Subject:* Re: Accessing array elements within items cached in Ignite
without deserialising the entire item



Great!



Here's .NET development page, in case you haven't seen it yet:
https://cwiki.apache.org/confluence/display/IGNITE/Ignite.NET+Development

Let me know if you need any assistance.



Pavel



On Thu, Aug 3, 2017 at 5:28 AM, Raymond Wilson <raymond_wilson@trimble.com>
wrote:

Hi Pavel,



Thanks for putting it on the plan.



I’ve been reading through the ‘how to contribute’ documentation to see
what’s required and have pulled a static download of the Git repository to
start looking at the code. I’ll see… J



Thanks,

Raymond.



*From:* Pavel Tupitsyn [mailto:ptupitsyn@apache.org]
*Sent:* Wednesday, August 2, 2017 9:08 PM


*To:* user@ignite.apache.org
*Subject:* Re: Accessing array elements within items cached in Ignite
without deserialising the entire item



Actually, you are right, we can add this easily, because internal API
allows random stream access.

I've filed a ticket: https://issues.apache.org/jira/browse/IGNITE-5904



Thank you for a good suggestion!

And, by the way, everyone is welcome to contribute, this ticket can be a
perfect start!



Pavel



On Wed, Aug 2, 2017 at 12:46 AM, Raymond Wilson <raymond_wilson@trimble.com>
wrote:

Hi Pavel,



Thanks for the clarifications. I certainly appreciate that cross platform
protocols constrain what can be done…



Thanks for pointing out IBinaryRawReader.



Regarding random access into arrays, is this something that is on the books
for a future version?

Thanks,

Raymond.



*From:* Pavel Tupitsyn [mailto:ptupitsyn@apache.org]
*Sent:* Tuesday, August 1, 2017 11:31 PM
*To:* user@ignite.apache.org
*Subject:* Re: Accessing array elements within items cached in Ignite
without deserialising the entire item



Hi Raymond,



First of all, BinaryObject is a cross-platform concept, it exists in C#,
C++, Java.

>From C# point of view there are some inconsistencies (like nullable Guid,
or non-generic collections),

but these things are dictated by the existing protocol, so we can't change
them.

In most cases you can just use WriteObject<>/ReadObject<> methods to avoid
these inconsistencies.



1. You can implement array pooling yourself using IBinaryRawReader methods.

   For example, byte array is written like rawWriter.WriteByte(arr.Length);
for (...) rawWriter.WriteByte(arr[i]);

   I think an extension method would be easy to write.



2. See above, use WriteObject<>/ReadObject<> to avoid dealing with nullables



3. Random array access is not possible with current API.



Thanks,

Pavel



On Tue, Aug 1, 2017 at 2:46 AM, Raymond Wilson <raymond_wilson@trimble.com>
wrote:

Hi,



I’ve been looking at IBinarizable and IBinarySerializer with regards to
controlling object serialization (using the Ignite.Net client).



A couple of questions:



1.       Some of the APIs in IBinarizable allow for a factory methods to
control construction of collection and dictionary elements, but not for
array elements (which could allow for performance optimization through
array pooling).

2.       GUID and DateTime elements are nullable (and there is no
non-nullable variant for these types). Apart from being inconsistent with
all the other types supported in the API, nullability in .Net carries a
performance penalty. Curious as to why these types are defined like this?

3.       I see it is possible to read arrays of elements. But I see no way
to read a particular element within an array without deserialising the
entire array. Is it possible to do something like  byte ReadByte(string
fieldname, uint index); ?



Thanks,

Raymond.

Mime
View raw message