ignite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Вадим Опольский <vaopols...@gmail.com>
Subject Re: IGNITE-13
Date Fri, 03 Mar 2017 08:45:07 GMT
Valentin,

What do you think about duplicated cycle in strToBinaryOutputStream ?

How to calculate StrLen для outBinaryHeap without this cycle ?

public class BinaryUtilsNew extends BinaryUtils {

    public static int getStrLen(String val) {
        int strLen = val.length();
        int utfLen = 0;
        int c;

        // Determine length of resulting byte array.




*for (int cnt = 0; cnt < strLen; cnt++) {            c =
val.charAt(cnt);            if (c >= 0x0001 && c <= 0x007F)*
     utfLen++;
       *     else if (c > 0x07FF)*
                utfLen += 3;
            else
                utfLen += 2;
        }

        return utfLen;
    }

    public static void strToUtf8BytesDirect(BinaryOutputStream
outBinaryHeap, String val) {

        int strLen = val.length();
        int c, cnt;

        int position = 0;

        outBinaryHeap.unsafeEnsure(1 + 4);

*   outBinaryHeap.unsafeWriteByte(GridBinaryMarshaller.STRING);
outBinaryHeap.unsafeWriteInt(getStrLen(val));*



* for (cnt = 0; cnt < strLen; cnt++) {            c = val.charAt(cnt);*
       *     if (c >= 0x0001 && c <= 0x007F)*
                outBinaryHeap.writeByte((byte) c);
         *   else if (c > 0x07FF) {*
                outBinaryHeap.writeByte((byte)(0xE0 | (c >> 12) & 0x0F));
                outBinaryHeap.writeByte((byte)(0x80 | (c >> 6) & 0x3F));
                outBinaryHeap.writeByte((byte)(0x80 | (c & 0x3F)));
            }
            else {
                outBinaryHeap.writeByte((byte)(0xC0 | ((c >> 6) & 0x1F)));
                outBinaryHeap.writeByte((byte)(0x80 | (c  & 0x3F)));
            }
        }
    }


Vadim


2017-03-03 2:00 GMT+03:00 Valentin Kulichenko <valentin.kulichenko@gmail.com
>:

> Vadim,
>
> Looks better now. Can you also try to modify the benchmark so that
> marshaller and writer are created outside of the measured method? I.e. the
> benchmark methods should be as simple as this:
>
>     @Benchmark
>     public void binaryHeapOutputStreamDirect() throws Exception {
>         writer.doWriteStringDirect(message);
>     }
>
>     @Benchmark
>     public void binaryHeapOutputStreamInDirect() throws Exception {
>         writer.doWriteString(message);
>     }
>
> In any case, do I understand correctly that it didn't actually make any
> performance difference? If so, I think we can close the ticket.
>
> Vova, can you also take a look and provide your thoughts?
>
> -Val
>
> On Thu, Mar 2, 2017 at 1:27 PM, Вадим Опольский <vaopolskij@gmail.com>
> wrote:
>
>> Hi Valentin!
>>
>> I've created:
>>
>> new method strToUtf8BytesDirect in BinaryUtilsNew
>> https://github.com/javaller/MyBenchmark/blob/master/src/main
>> /java/org/sample/BinaryUtilsNew.java
>>
>> new method doWriteStringDirect in BinaryWriterExImplNew
>> https://github.com/javaller/MyBenchmark/blob/master/src/main
>> /java/org/sample/BinaryWriterExImplNew.java
>>
>> benchmarks for BinaryWriterExImpl doWriteString and BinaryWriterExImplNew
>> doWriteStringDirect
>> https://github.com/javaller/MyBenchmark/blob/master/src/main
>> /java/org/sample/ExampleTest.java
>>
>> This is a result of comparing:
>>
>> Benchmark
>> Mode  Cnt   Score               Error         UnitsExampleTest.binaryHeapOutputStreamDirect
>> avgt   50  1128448,743 ± 13536,689  ns/opExampleTest.binaryHeapOutputStreamInDirect
>> avgt   50  1127270,695 ± 17309,256  ns/op
>>
>> Vadim
>>
>> 2017-03-02 1:02 GMT+03:00 Valentin Kulichenko <
>> valentin.kulichenko@gmail.com>:
>>
>>> Hi Vadim,
>>>
>>> We're getting closer :) I would actually like to see the test for actual
>>> implementation of BinaryWriterExImpl#doWriteString method. Logic in
>>> binaryHeapOutputInDirect() confuses me a bit and I'm not sure comparison is
>>> valid.
>>>
>>> Can you please do the following:
>>>
>>> 1. Create new BinaryUtils#strToUtf8BytesDirect method, copy-paste the
>>> code from existing BinaryUtils#strToUtf8Bytes and modify it so that it
>>> takes BinaryOutputStream as an argument and writes to it directly. Do not
>>> create stream inside this method, as it's the same as creating new array.
>>> 2. Create new BinaryWriterExImpl#doWriteStringDirect, copy-paste the
>>> code from existing BinaryWriterExImpl#doWriteString and modify it so
>>> that it uses BinaryUtils#strToUtf8BytesDirect and doesn't
>>> call out.writeByteArray.
>>> 3. Create benchmark for BinaryWriterExImpl#doWriteString method. I.e.,
>>> create an instance of BinaryWriterExImpl and call doWriteString() in
>>> benchmark method.
>>> 4. Similarly, create benchmark for BinaryWriterExImpl#doWriteStri
>>> ngDirect.
>>> 5. Compare results.
>>>
>>> This will give us clear picture of how these two approaches perform.
>>> Your current results are actually promising, but I would like to confirm
>>> them.
>>>
>>> -Val
>>>
>>> On Wed, Mar 1, 2017 at 6:17 AM, Вадим Опольский <vaopolskij@gmail.com>
>>> wrote:
>>>
>>>> Hi Valentin!
>>>>
>>>> Thank you for comments.
>>>>
>>>> There is a new method which writes directly to BinaryOutputStream
>>>> instead of intermediate array.
>>>> https://github.com/javaller/MyBenchmark/blob/master/src/main
>>>> /java/org/sample/BinaryUtilsNew.java
>>>>
>>>> There is benchmark.
>>>> https://github.com/javaller/MyBenchmark/blob/master/src/main
>>>> /java/org/sample/MyBenchmark.java
>>>>
>>>> Unit test
>>>> https://github.com/javaller/MyBenchmark/blob/master/src/main
>>>> /java/org/sample/BinaryOutputStreamTest.java
>>>>
>>>> Statistics
>>>> https://github.com/javaller/MyBenchmark/blob/master/out_01_03_17.txt
>>>>
>>>> Benchmark
>>>>  Mode       Cnt    Score        Error  Units MyBenchmark.binaryHeapOutputIn
>>>> Direct            avgt          50  111,337 ± 0,742  ns/op
>>>> MyBenchmark.binaryHeapOutputStreamDirect   avgt          50   23,847 ±
>>>> 0,303    ns/op
>>>>
>>>>
>>>> Vadim
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> 2017-02-28 4:29 GMT+03:00 Valentin Kulichenko <
>>>> valentin.kulichenko@gmail.com>:
>>>>
>>>>> Hi Vadim,
>>>>>
>>>>> Looks like you accidentally removed dev list from the thread, adding
>>>>> it back.
>>>>>
>>>>> I think there is still misunderstanding. What I propose is to modify
>>>>> the BinaryUtils#strToUtf8Bytes so that it writes directly to BinaryOutputStream
>>>>> instead of intermediate array. This should decrease memory consumption
and
>>>>> can also increase performance as we will avoid 'writeByteArray' step
>>>>> at the end.
>>>>>
>>>>> Does it make sense to you?
>>>>>
>>>>> -Val
>>>>>
>>>>> On Mon, Feb 27, 2017 at 6:55 AM, Вадим Опольский <vaopolskij@gmail.com
>>>>> > wrote:
>>>>>
>>>>>> Hi, Valentin!
>>>>>>
>>>>>> What do you think about using the methods of BinaryOutputStream:
>>>>>>
>>>>>> 1) writeByteArray(byte[] val)
>>>>>> 2) writeCharArray(char[] val)
>>>>>> 3) write (byte[] arr, int off, int len)
>>>>>>
>>>>>> String val = "Test";
>>>>>>     out.writeByteArray( val.getBytes(UTF_8));
>>>>>>
>>>>>>  String val = "Test";
>>>>>>     out.writeCharArray(str.toCharArray());
>>>>>>
>>>>>> String val = "Test"
>>>>>> InputStream stream = new ByteArrayInputStream(
>>>>>> exampleString.getBytes(StandartCharsets.UTF_8));
>>>>>> byte[] buffer = new byte[1024];
>>>>>> while ((buffer = stream.read()) != -1) {
>>>>>> out.writeByteArray(buffer);
>>>>>> }
>>>>>>
>>>>>> What else can we use ?
>>>>>>
>>>>>> Vadim
>>>>>>
>>>>>>
>>>>>> 2017-02-25 2:21 GMT+03:00 Valentin Kulichenko <
>>>>>> valentin.kulichenko@gmail.com>:
>>>>>>
>>>>>>> Hi Vadim,
>>>>>>>
>>>>>>> Which method implements the approach described in the ticket?
From
>>>>>>> what I see, all writeToStringX versions are still encoding into
an
>>>>>>> intermediate array and then call out.writeByteArray. What we
need to test
>>>>>>> is the approach where bytes are written directly into the stream
during
>>>>>>> encoding. Encoding algorithm itself should stay the same for
now, otherwise
>>>>>>> we will not know how to interpret the result.
>>>>>>>
>>>>>>> It looks like there is some misunderstanding here, so please
let me
>>>>>>> know anything is still unclear. I will be happy to answer your
questions.
>>>>>>>
>>>>>>> -Val
>>>>>>>
>>>>>>> On Wed, Feb 22, 2017 at 7:22 PM, Valentin Kulichenko <
>>>>>>> valentin.kulichenko@gmail.com> wrote:
>>>>>>>
>>>>>>>> Hi Vadim,
>>>>>>>>
>>>>>>>> Thanks, I will review this week.
>>>>>>>>
>>>>>>>> -Val
>>>>>>>>
>>>>>>>> On Wed, Feb 22, 2017 at 2:28 AM, Вадим Опольский
<
>>>>>>>> vaopolskij@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Hi Valentin!
>>>>>>>>>
>>>>>>>>> https://issues.apache.org/jira/browse/IGNITE-13
>>>>>>>>>
>>>>>>>>> I created BinaryWriterExImplNew (extended of BinaryWriterExImpl)
and
>>>>>>>>> added new methods with changes described in the ticket
>>>>>>>>>
>>>>>>>>> https://github.com/javaller/MyBenchmark/blob/master/src/main
>>>>>>>>> /java/org/sample/BinaryWriterExImplNew.java
>>>>>>>>>
>>>>>>>>> I created a benchmark for BinaryWriterExImplNew
>>>>>>>>>
>>>>>>>>> https://github.com/javaller/MyBenchmark/blob/master/src/main
>>>>>>>>> /java/org/sample/ExampleTest.java
>>>>>>>>>
>>>>>>>>> I run benchmark and compared results
>>>>>>>>>
>>>>>>>>> https://github.com/javaller/MyBenchmark/blob/master/totalstat.txt
>>>>>>>>>
>>>>>>>>> # Run complete. Total time: 00:10:24
>>>>>>>>> Benchmark                                    Mode  Cnt
>>>>>>>>> Score       Error  Units
>>>>>>>>> ExampleTest.binaryHeapOutputStream1          avgt   50
>>>>>>>>> 1114999,207 ± 16756,776  ns/op
>>>>>>>>> ExampleTest.binaryHeapOutputStream2          avgt   50
>>>>>>>>> 1118149,320 ± 17515,961  ns/op
>>>>>>>>> ExampleTest.binaryHeapOutputStream3          avgt   50
>>>>>>>>> 1113678,657 ± 17652,314  ns/op
>>>>>>>>> ExampleTest.binaryHeapOutputStream4          avgt   50
>>>>>>>>> 1112415,051 ± 18273,874  ns/op
>>>>>>>>> ExampleTest.binaryHeapOutputStream5          avgt   50
>>>>>>>>> 1111366,583 ± 18282,829  ns/op
>>>>>>>>> ExampleTest.binaryHeapOutputStreamACSII   avgt   50 
1112079,667
>>>>>>>>> ± 16659,532  ns/op
>>>>>>>>> ExampleTest.binaryHeapOutputStreamUTFCustom  avgt   50
>>>>>>>>> 1114949,759 ± 16809,669  ns/op
>>>>>>>>> ExampleTest.binaryHeapOutputStreamUTFNIO        avgt
  50
>>>>>>>>> 1121462,325 ± 19836,466  ns/op
>>>>>>>>>
>>>>>>>>> Is it OK? Whats the next step? Do I have to move this
>>>>>>>>> JMH benchmark to the Ignite project ?
>>>>>>>>>
>>>>>>>>> Vadim Opolski
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> 2017-02-21 1:06 GMT+03:00 Valentin Kulichenko <
>>>>>>>>> valentin.kulichenko@gmail.com>:
>>>>>>>>>
>>>>>>>>>> Hi Vadim,
>>>>>>>>>>
>>>>>>>>>> I'm not sure I understand your benchmarks and how
they verify the
>>>>>>>>>> optimization discussed here. Basically, here is what
needs to be done:
>>>>>>>>>>
>>>>>>>>>> 1. Create a benchmark for BinaryWriterExImpl#doWriteString
>>>>>>>>>> method.
>>>>>>>>>> 2. Run the benchmark with current implementation.
>>>>>>>>>> 3. Make the change described in the ticket.
>>>>>>>>>> 4. Run the benchmark with these changes.
>>>>>>>>>> 5. Compare results.
>>>>>>>>>>
>>>>>>>>>> Makes sense? Let me know if anything is unclear.
>>>>>>>>>>
>>>>>>>>>> -Val
>>>>>>>>>>
>>>>>>>>>> On Mon, Feb 20, 2017 at 8:51 AM, Вадим Опольский
<
>>>>>>>>>> vaopolskij@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hello everybody!
>>>>>>>>>>>
>>>>>>>>>>> https://issues.apache.org/jira/browse/IGNITE-13
>>>>>>>>>>>
>>>>>>>>>>> Valentin, I just have finished benchmark (with
JMH) -
>>>>>>>>>>> https://github.com/javaller/MyBenchmark.git
>>>>>>>>>>>
>>>>>>>>>>> It collect data about time working of serialization.
>>>>>>>>>>>
>>>>>>>>>>> For instance - https://github.com/javaller/My
>>>>>>>>>>> Benchmark/blob/master/out200217.txt
>>>>>>>>>>>
>>>>>>>>>>> To start it you have to do next:
>>>>>>>>>>>
>>>>>>>>>>> 1) clone it - git colne https://github.com/javal
>>>>>>>>>>> ler/MyBenchmark.git
>>>>>>>>>>>
>>>>>>>>>>> 2) install it - mvn install
>>>>>>>>>>>
>>>>>>>>>>> 3) run benchmarks -  java -Xms1024m -Xmx4096m
-jar
>>>>>>>>>>> target\benchmarks.jar
>>>>>>>>>>>
>>>>>>>>>>> Vadim Opolski
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> 2017-02-15 0:52 GMT+03:00 Valentin Kulichenko
<
>>>>>>>>>>> valentin.kulichenko@gmail.com>:
>>>>>>>>>>>
>>>>>>>>>>>> Vladimir,
>>>>>>>>>>>>
>>>>>>>>>>>> I think we misunderstood each other. My understanding
of this
>>>>>>>>>>>> optimization is the following.
>>>>>>>>>>>>
>>>>>>>>>>>> Currently string serialization is done in
two steps (see
>>>>>>>>>>>> BinaryWriterExImpl#doWriteString):
>>>>>>>>>>>>
>>>>>>>>>>>> strArr = BinaryUtils.strToUtf8Bytes(val);
// Encode string
>>>>>>>>>>>> into byte array.
>>>>>>>>>>>> out.writeByteArray(strArr);             
        // Write byte
>>>>>>>>>>>> array into stream.
>>>>>>>>>>>>
>>>>>>>>>>>> What this ticket suggests is to write directly
into stream
>>>>>>>>>>>> while string is encoded, without intermediate
array. This both reduces
>>>>>>>>>>>> memory consumption and eliminates array copy
step.
>>>>>>>>>>>>
>>>>>>>>>>>> I updated the ticket and added this explanation
there.
>>>>>>>>>>>>
>>>>>>>>>>>> Vadim, can you create a micro benchmark and
check if it gives
>>>>>>>>>>>> any improvement?
>>>>>>>>>>>>
>>>>>>>>>>>> -Val
>>>>>>>>>>>>
>>>>>>>>>>>> On Sun, Feb 12, 2017 at 10:38 PM, Vladimir
Ozerov <
>>>>>>>>>>>> vozerov@gridgain.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>
>>>>>>>>>>>>> It is hard to say whether it makes sense
or not. No doubt, it
>>>>>>>>>>>>> could speed up marshalling process at
the cost of 2x memory required for
>>>>>>>>>>>>> strings. From my previous experience
with marshalling micro-optimizations,
>>>>>>>>>>>>> we will hardly ever notice speedup in
distributed environment.
>>>>>>>>>>>>>
>>>>>>>>>>>>> But, there is another sied - it could
speedup our queries,
>>>>>>>>>>>>> because we will not have to unmarshal
string on every field access. So I
>>>>>>>>>>>>> would try to make this optimization optional
and then measure query
>>>>>>>>>>>>> performance with classes having lots
of strings. It could give us
>>>>>>>>>>>>> interesting results.
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Mon, Feb 13, 2017 at 5:37 AM, Valentin
Kulichenko <
>>>>>>>>>>>>> valentin.kulichenko@gmail.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Vladimir,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Can you please take a look and provide
your thoughts? Can
>>>>>>>>>>>>>> this be applied to binary marshaller?
From what I recall, it serializes
>>>>>>>>>>>>>> string a bit differently from optimized
marshaller, so I'm not sure.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> -Val
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Fri, Feb 10, 2017 at 5:16 PM,
Dmitriy Setrakyan <
>>>>>>>>>>>>>> dsetrakyan@apache.org> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Thu, Feb 9, 2017 at 11:26
PM, Valentin Kulichenko <
>>>>>>>>>>>>>>> valentin.kulichenko@gmail.com>
wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> > Hi Vadim,
>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>> > I don't think it makes much
sense to invest into
>>>>>>>>>>>>>>> OptimizedMarshaller.
>>>>>>>>>>>>>>> > However, I would check if
this optimization is applicable
>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>> > BinaryMarshaller, and if
yes, implement it.
>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Val, in this case can you please
update the ticket?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>> > -Val
>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>> > On Thu, Feb 9, 2017 at 11:05
PM, Вадим Опольский <
>>>>>>>>>>>>>>> vaopolskij@gmail.com>
>>>>>>>>>>>>>>> > wrote:
>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>> > > Dear sirs!
>>>>>>>>>>>>>>> > >
>>>>>>>>>>>>>>> > > I want to resolve issue
IGNITE-13 -
>>>>>>>>>>>>>>> > > https://issues.apache.org/jira/browse/IGNITE-13
>>>>>>>>>>>>>>> > >
>>>>>>>>>>>>>>> > > Is it actual?
>>>>>>>>>>>>>>> > >
>>>>>>>>>>>>>>> > > Vadim Opolski
>>>>>>>>>>>>>>> > >
>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message