ignite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vyacheslav Daradur <daradu...@gmail.com>
Subject Re: Data compression in Ignite 2.0
Date Tue, 06 Jun 2017 14:29:43 GMT
In short,

During marshalling a fields is represented as BinaryFieldAccessor which
manages its marshalling. It checks if the field is marked by annotation
@BinaryCompression, in that case - binary  representation of field (bytes
array) will be compressed. It will be marked as compressed by types
constant (GridBinaryMarshaller.COMPRESSED), after this the compressed bytes
array wiil be include in binary representation of whole object. Note,
header of marshalled object will not be compressed. Compression affected
only object's field representation.

Objects in IgniteCache is represented as BinaryObject which is wrapper over
bytes array of marshalled object.
BinaryObject provides some usefull methods, which are used by Ignite
systems.
For example, the Queries use BinaryObject#field method, which deserializes
only field of object, without deserializing of whole object.
BinaryObject#field method during deserialization, if meets the constant of
compressed type, decompress this bytes array, then continue unmarshalling
as usual.

Now, I introduced the Compressor interface in IgniteConfigurations, it
allows user to use own implementation of compressor - it is the requirement
in the task[1].

As far as I know, Vladimir Ozerov doesn't like the idea of granting this
opportunity to the user.
In that case we can choose a compression algorithm which we will provide by
default and will move the interface to internals of binary infractructure.
For this case I've prepared benchmarked, which I've sent earlier.

I vote for ZSTD algorithm[2], it provides good compression ratio and good
throughput. It has implementation in Java, .NET and C++, and has
ASF-friendly license, we can use it in the all Ignite platforms.
You can look at an assessment of this algorithm in my benchmark's

[1] https://issues.apache.org/jira/browse/IGNITE-3592
[2]https://github.com/facebook/zstd


2017-06-06 16:02 GMT+03:00 Антон Чураев <churaev.an@gmail.com>:

> Looks good for me.
>
> Could You propose design of implementation in couple of sentences?
> So that we can estimate the completeness and complexity of the proposal.
>
> 2017-06-06 15:26 GMT+03:00 Vyacheslav Daradur <daradurvs@gmail.com>:
>
> > Anton,
> >
> > Of course, the solution does not affect on existing implementation. I
> mean,
> > there is no changes if user not use the annotation @BinaryCompression.
> (no
> > performance changes)
> > Only if user make decision to use compression on specific field or fields
> > of a class - in that case compression will be used at marshalling in
> > relation to annotated fields.
> >
> > 2017-06-06 15:10 GMT+03:00 Антон Чураев <churaev.an@gmail.com>:
> >
> > > Vyacheslav,
> > >
> > > Is it possible to propose implementation that can be switched on
> > on-demand?
> > > In this case it should not affect performance of current solution.
> > >
> > > I mean, that users should make decision what is more important for
> them:
> > > throutput or memory/net usage.
> > > May be they will be choose not all objects, or only some attributes of
> > > objects for compress.
> > >
> > > 2017-06-06 14:48 GMT+03:00 Vyacheslav Daradur <daradurvs@gmail.com>:
> > >
> > > > Conclusion:
> > > > Provided solution allows reduce size of an object in IgniteCache at
> the
> > > > cost of throughput reduction (small - in some cases), it depends on
> > part
> > > of
> > > > object which will be compressed and compression algorithm.
> > > > I mean, we can make more effective use of memory, and in some cases
> it
> > > can
> > > > reduce loading of the interconnect. (replication, rebalancing)
> > > >
> > > > Especially, it will be particularly useful for object's fields which
> > are
> > > > large text (>~ 250 bytes) and can be effectively compressed.
> > > >
> > > > 2017-06-06 12:00 GMT+03:00 Антон Чураев <churaev.an@gmail.com>:
> > > >
> > > > > Vyacheslav, thank you! But could you please provide a conclusions
> or
> > > > > proposals based on this benchmarks?
> > > > >
> > > > > 2017-06-06 11:28 GMT+03:00 Vyacheslav Daradur <daradurvs@gmail.com
> >:
> > > > >
> > > > > > Dmitry,
> > > > > >
> > > > > > Excel-pages:
> > > > > >
> > > > > > 1). "Compression ratio (2)" - shows object size, with compression
> > and
> > > > > > without compression. (Conditions: literal text)
> > > > > > 1st graph shows compression ratios of using different compression
> > > > > algrithms
> > > > > > depending on size of compressed field.
> > > > > > 2nd graph shows evaluation of size of objects depending on sizes
> > and
> > > > > > compression algorithms.
> > > > > >
> > > > > > 2). "Compression ratio (1)" - shows object size, with compression
> > and
> > > > > > without compression. (Conditions:  badly compressed character
> > > sequence)
> > > > > > 1st graph shows compression ratios of using different compression
> > > > > > algrithms depending on size of compressed field.
> > > > > > 2nd graph shows evaluation of size of objects depending on sizes
> > and
> > > > > > compression algorithms.
> > > > > >
> > > > > > 3) 'put-avg" - shows average time of the "put" operation
> depending
> > on
> > > > > size
> > > > > > and compression algorithms.
> > > > > >
> > > > > > 4) 'put-thrpt" - shows throughput of the "put" operation
> depending
> > on
> > > > > size
> > > > > > and compression algorithms.
> > > > > >
> > > > > > 5) 'get-avg" - shows average time of the "get" operation
> depending
> > on
> > > > > size
> > > > > > and compression algorithms.
> > > > > >
> > > > > > 6) 'get-thrpt" - shows throughput of the "get" operation
> depending
> > on
> > > > > size
> > > > > > and compression algorithms.
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > 2017-06-06 10:59 GMT+03:00 Dmitriy Setrakyan <
> > dsetrakyan@apache.org
> > > >:
> > > > > >
> > > > > > > Vladimir, I am not sure how to interpret the graphs? What
are
> we
> > > > > looking
> > > > > > > at?
> > > > > > >
> > > > > > > On Tue, Jun 6, 2017 at 12:33 AM, Vyacheslav Daradur <
> > > > > daradurvs@gmail.com
> > > > > > >
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Hi, Igniters.
> > > > > > > >
> > > > > > > > I've prepared some benchmarking. Results [1].
> > > > > > > >
> > > > > > > > And I've prepared the evaluation in the form of diagrams
[2].
> > > > > > > >
> > > > > > > > I hope that helps to interest the community and accelerates
a
> > > > > reaction
> > > > > > to
> > > > > > > > this improvment :)
> > > > > > > >
> > > > > > > > [1]
> > > > > > > > https://github.com/daradurvs/ignite-compression/tree/
> > > > > > > > master/src/main/resources/result
> > > > > > > > [2] https://drive.google.com/file/d/
> > > 0B2CeUAOgrHkoMklyZ25YTEdKcEk/
> > > > > view
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > 2017-05-24 9:49 GMT+03:00 Vyacheslav Daradur <
> > > daradurvs@gmail.com
> > > > >:
> > > > > > > >
> > > > > > > > > Guys, any thoughts?
> > > > > > > > >
> > > > > > > > > 2017-05-16 13:40 GMT+03:00 Vyacheslav Daradur
<
> > > > daradurvs@gmail.com
> > > > > >:
> > > > > > > > >
> > > > > > > > >> Hi guys,
> > > > > > > > >>
> > > > > > > > >> I've prepared the PR to show my idea.
> > > > > > > > >> https://github.com/apache/ignite/pull/1951/files
> > > > > > > > >>
> > > > > > > > >> About querying - I've just copied existing
tests and have
> > > > > annotated
> > > > > > > the
> > > > > > > > >> testing data.
> > > > > > > > >> https://github.com/apache/ignite/pull/1951/files#diff-
> > c19a9d
> > > > > > > > >> f4058141d059bb577e75244764
> > > > > > > > >>
> > > > > > > > >> It means fields which will be marked by @BinaryCompression
> > > will
> > > > be
> > > > > > > > >> compressed at marshalling via BinaryMarshaller.
> > > > > > > > >>
> > > > > > > > >> This solution has no effect on existing data
or project
> > > > > > architecture.
> > > > > > > > >>
> > > > > > > > >> I'll be glad to see your thougths.
> > > > > > > > >>
> > > > > > > > >>
> > > > > > > > >> 2017-05-15 19:18 GMT+03:00 Vyacheslav Daradur
<
> > > > > daradurvs@gmail.com
> > > > > > >:
> > > > > > > > >>
> > > > > > > > >>> Dmitriy,
> > > > > > > > >>>
> > > > > > > > >>> I have ready prototype. I want to show
it.
> > > > > > > > >>> It is always easier to discuss on example.
> > > > > > > > >>>
> > > > > > > > >>> 2017-05-15 19:02 GMT+03:00 Dmitriy Setrakyan
<
> > > > > > dsetrakyan@apache.org
> > > > > > > >:
> > > > > > > > >>>
> > > > > > > > >>>> Vyacheslav,
> > > > > > > > >>>>
> > > > > > > > >>>> I think it is a bit premature to
provide a PR without
> > > getting
> > > > a
> > > > > > > > >>>> community
> > > > > > > > >>>> consensus on the dev list. Please
allow some time for
> the
> > > > > > community
> > > > > > > to
> > > > > > > > >>>> respond.
> > > > > > > > >>>>
> > > > > > > > >>>> D.
> > > > > > > > >>>>
> > > > > > > > >>>> On Mon, May 15, 2017 at 6:36 AM,
Vyacheslav Daradur <
> > > > > > > > >>>> daradurvs@gmail.com>
> > > > > > > > >>>> wrote:
> > > > > > > > >>>>
> > > > > > > > >>>> > I created the ticket: https://issues.apache.org/jira
> > > > > > > > >>>> /browse/IGNITE-5226
> > > > > > > > >>>> >
> > > > > > > > >>>> > I'll prepare a PR with described
solution in couple of
> > > days.
> > > > > > > > >>>> >
> > > > > > > > >>>> > 2017-05-15 15:05 GMT+03:00 Vyacheslav
Daradur <
> > > > > > > daradurvs@gmail.com
> > > > > > > > >:
> > > > > > > > >>>> >
> > > > > > > > >>>> > > Hi, Igniters!
> > > > > > > > >>>> > >
> > > > > > > > >>>> > > Apache 2.0 is released.
> > > > > > > > >>>> > >
> > > > > > > > >>>> > > Let's continue the discussion
about a compression
> > > design.
> > > > > > > > >>>> > >
> > > > > > > > >>>> > > At the moment, I found
only one solution which is
> > > > compatible
> > > > > > > with
> > > > > > > > >>>> > querying
> > > > > > > > >>>> > > and indexing, this is per-objects-field
compression.
> > > > > > > > >>>> > > Per-fields compression
means that metadata (a
> header)
> > of
> > > > an
> > > > > > > object
> > > > > > > > >>>> won't
> > > > > > > > >>>> > > be compressed, only serialized
values of an object
> > > fields
> > > > > (in
> > > > > > > > bytes
> > > > > > > > >>>> array
> > > > > > > > >>>> > > form) will be compressed.
> > > > > > > > >>>> > >
> > > > > > > > >>>> > > This solution have some
contentious issues:
> > > > > > > > >>>> > > - small values, like primitives
and short arrays -
> > there
> > > > > isn't
> > > > > > > > >>>> sense to
> > > > > > > > >>>> > > compress them;
> > > > > > > > >>>> > > - there is no possible
to use compression with
> > > > > java-predefined
> > > > > > > > >>>> types;
> > > > > > > > >>>> > >
> > > > > > > > >>>> > > We can provide an annotation,
@IgniteCompression -
> for
> > > > > > example,
> > > > > > > > >>>> which can
> > > > > > > > >>>> > > be used by users for marking
fields to compress.
> > > > > > > > >>>> > >
> > > > > > > > >>>> > > Any thoughts?
> > > > > > > > >>>> > >
> > > > > > > > >>>> > > Maybe someone already have
ready design?
> > > > > > > > >>>> > >
> > > > > > > > >>>> > > 2017-04-10 11:06 GMT+03:00
Vyacheslav Daradur <
> > > > > > > > daradurvs@gmail.com
> > > > > > > > >>>> >:
> > > > > > > > >>>> > >
> > > > > > > > >>>> > >> Alexey,
> > > > > > > > >>>> > >>
> > > > > > > > >>>> > >> Yes, I've read it.
> > > > > > > > >>>> > >>
> > > > > > > > >>>> > >> Ok, let's discuss about
public API design.
> > > > > > > > >>>> > >>
> > > > > > > > >>>> > >> I think we need to
add some a configure entity to
> > > > > > > > >>>> CacheConfiguration,
> > > > > > > > >>>> > >> which will contain
the Compressor interface
> > > > implementation
> > > > > > and
> > > > > > > > some
> > > > > > > > >>>> > usefull
> > > > > > > > >>>> > >> parameters.
> > > > > > > > >>>> > >> Or maybe to provide
a BinaryMarshaller decorator,
> > which
> > > > > will
> > > > > > be
> > > > > > > > >>>> compress
> > > > > > > > >>>> > >> data after marshalling.
> > > > > > > > >>>> > >>
> > > > > > > > >>>> > >>
> > > > > > > > >>>> > >> 2017-04-10 10:40 GMT+03:00
Alexey Kuznetsov <
> > > > > > > > akuznetsov@apache.org
> > > > > > > > >>>> >:
> > > > > > > > >>>> > >>
> > > > > > > > >>>> > >>> Vyacheslav,
> > > > > > > > >>>> > >>>
> > > > > > > > >>>> > >>> Did you read initial
discussion [1] about
> > compression?
> > > > > > > > >>>> > >>> As far as I remember
we agreed to add only some
> > > > > "top-level"
> > > > > > > API
> > > > > > > > in
> > > > > > > > >>>> > order
> > > > > > > > >>>> > >>> to
> > > > > > > > >>>> > >>> provide a way for
> > > > > > > > >>>> > >>> Ignite users to
inject some sort of custom
> > > compression.
> > > > > > > > >>>> > >>>
> > > > > > > > >>>> > >>>
> > > > > > > > >>>> > >>> [1]
> > > > > > > > >>>> > >>> http://apache-ignite-developers.2346864.n4.nabble
> .
> > > > > > com/Data-c
> > > > > > > > >>>> > >>> ompression-in-Ignite-2-0-td10099.html
> > > > > > > > >>>> > >>>
> > > > > > > > >>>> > >>> On Mon, Apr 10,
2017 at 2:19 PM, daradurvs <
> > > > > > > daradurvs@gmail.com
> > > > > > > > >
> > > > > > > > >>>> > wrote:
> > > > > > > > >>>> > >>>
> > > > > > > > >>>> > >>> > Hi Igniters!
> > > > > > > > >>>> > >>> >
> > > > > > > > >>>> > >>> > I am interested
in this task.
> > > > > > > > >>>> > >>> > Provide some
kind of pluggable compression SPI
> > > support
> > > > > > > > >>>> > >>> > <https://issues.apache.org/
> > jira/browse/IGNITE-3592>
> > > > > > > > >>>> > >>> >
> > > > > > > > >>>> > >>> > I developed
a solution on
> BinaryMarshaller-level,
> > > but
> > > > > > > reviewer
> > > > > > > > >>>> has
> > > > > > > > >>>> > >>> rejected
> > > > > > > > >>>> > >>> > it.
> > > > > > > > >>>> > >>> >
> > > > > > > > >>>> > >>> > Let's continue
discussion of task goals and
> > solution
> > > > > > design.
> > > > > > > > >>>> > >>> > As I understood
that, the main goal of this task
> > is
> > > to
> > > > > > store
> > > > > > > > >>>> data in
> > > > > > > > >>>> > >>> > compressed
form.
> > > > > > > > >>>> > >>> > This is what
I need from Ignite as its user.
> > > > Compression
> > > > > > > > >>>> provides
> > > > > > > > >>>> > >>> economy
> > > > > > > > >>>> > >>> > on
> > > > > > > > >>>> > >>> > servers.
> > > > > > > > >>>> > >>> > We can store
more data on same servers at the
> cost
> > > of
> > > > > > > > >>>> increasing CPU
> > > > > > > > >>>> > >>> > utilization.
> > > > > > > > >>>> > >>> >
> > > > > > > > >>>> > >>> > I'm researching
a possibility of implementation
> of
> > > > > > > compression
> > > > > > > > >>>> at the
> > > > > > > > >>>> > >>> > cache-level.
> > > > > > > > >>>> > >>> >
> > > > > > > > >>>> > >>> > Any thoughts?
> > > > > > > > >>>> > >>> >
> > > > > > > > >>>> > >>> > --
> > > > > > > > >>>> > >>> > Best regards,
> > > > > > > > >>>> > >>> > Vyacheslav
> > > > > > > > >>>> > >>> >
> > > > > > > > >>>> > >>> >
> > > > > > > > >>>> > >>> >
> > > > > > > > >>>> > >>> >
> > > > > > > > >>>> > >>> > --
> > > > > > > > >>>> > >>> > View this
message in context:
> > http://apache-ignite-
> > > > > > > > >>>> > >>> > developers.2346864.n4.nabble.
> > > com/Data-compression-in-
> > > > > > > > >>>> > >>> > Ignite-2-0-tp10099p16317.html
> > > > > > > > >>>> > >>> > Sent from
the Apache Ignite Developers mailing
> > list
> > > > > > archive
> > > > > > > at
> > > > > > > > >>>> > >>> Nabble.com.
> > > > > > > > >>>> > >>> >
> > > > > > > > >>>> > >>>
> > > > > > > > >>>> > >>>
> > > > > > > > >>>> > >>>
> > > > > > > > >>>> > >>> --
> > > > > > > > >>>> > >>> Alexey Kuznetsov
> > > > > > > > >>>> > >>>
> > > > > > > > >>>> > >>
> > > > > > > > >>>> > >>
> > > > > > > > >>>> > >>
> > > > > > > > >>>> > >> --
> > > > > > > > >>>> > >> Best Regards, Vyacheslav
> > > > > > > > >>>> > >>
> > > > > > > > >>>> > >
> > > > > > > > >>>> > >
> > > > > > > > >>>> > >
> > > > > > > > >>>> > > --
> > > > > > > > >>>> > > Best Regards, Vyacheslav
> > > > > > > > >>>> > >
> > > > > > > > >>>> >
> > > > > > > > >>>> >
> > > > > > > > >>>> >
> > > > > > > > >>>> > --
> > > > > > > > >>>> > Best Regards, Vyacheslav
> > > > > > > > >>>> >
> > > > > > > > >>>>
> > > > > > > > >>>
> > > > > > > > >>>
> > > > > > > > >>>
> > > > > > > > >>> --
> > > > > > > > >>> Best Regards, Vyacheslav
> > > > > > > > >>>
> > > > > > > > >>
> > > > > > > > >>
> > > > > > > > >>
> > > > > > > > >> --
> > > > > > > > >> Best Regards, Vyacheslav
> > > > > > > > >>
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > --
> > > > > > > > > Best Regards, Vyacheslav
> > > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > --
> > > > > > > > Best Regards, Vyacheslav
> > > > > > > >
> > > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Best Regards, Vyacheslav
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > >
> > > > > Best Regards, Anton Churaev
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Best Regards, Vyacheslav
> > > >
> > >
> > >
> > >
> > > --
> > >
> > > Best Regards, Anton Churaev
> > >
> >
> >
> >
> > --
> > Best Regards, Vyacheslav
> >
>
>
>
> --
>
> Best Regards, Anton Churaev
>



-- 
Best Regards, Vyacheslav

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message