ignite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Антон Чураев <churaev...@gmail.com>
Subject Re: Data compression in Ignite 2.0
Date Wed, 07 Jun 2017 08:06:00 GMT
Vyacheslav, correct me if something wrong

We could provide opportunity of choose between CPU usage and MEM/NET usage
for users by compression some attributes of stored objects.
You have learned design, and it is possible to localize changes in
marshalling without performance affect and current functionality.

I think, that it's usefull for our project and users.
Community, what do you think about this proposal?


2017-06-06 17:29 GMT+03:00 Vyacheslav Daradur <daradurvs@gmail.com>:

> In short,
>
> During marshalling a fields is represented as BinaryFieldAccessor which
> manages its marshalling. It checks if the field is marked by annotation
> @BinaryCompression, in that case - binary  representation of field (bytes
> array) will be compressed. It will be marked as compressed by types
> constant (GridBinaryMarshaller.COMPRESSED), after this the compressed
> bytes
> array wiil be include in binary representation of whole object. Note,
> header of marshalled object will not be compressed. Compression affected
> only object's field representation.
>
> Objects in IgniteCache is represented as BinaryObject which is wrapper over
> bytes array of marshalled object.
> BinaryObject provides some usefull methods, which are used by Ignite
> systems.
> For example, the Queries use BinaryObject#field method, which deserializes
> only field of object, without deserializing of whole object.
> BinaryObject#field method during deserialization, if meets the constant of
> compressed type, decompress this bytes array, then continue unmarshalling
> as usual.
>
> Now, I introduced the Compressor interface in IgniteConfigurations, it
> allows user to use own implementation of compressor - it is the requirement
> in the task[1].
>
> As far as I know, Vladimir Ozerov doesn't like the idea of granting this
> opportunity to the user.
> In that case we can choose a compression algorithm which we will provide by
> default and will move the interface to internals of binary infractructure.
> For this case I've prepared benchmarked, which I've sent earlier.
>
> I vote for ZSTD algorithm[2], it provides good compression ratio and good
> throughput. It has implementation in Java, .NET and C++, and has
> ASF-friendly license, we can use it in the all Ignite platforms.
> You can look at an assessment of this algorithm in my benchmark's
>
> [1] https://issues.apache.org/jira/browse/IGNITE-3592
> [2]https://github.com/facebook/zstd
>
>
> 2017-06-06 16:02 GMT+03:00 Антон Чураев <churaev.an@gmail.com>:
>
> > Looks good for me.
> >
> > Could You propose design of implementation in couple of sentences?
> > So that we can estimate the completeness and complexity of the proposal.
> >
> > 2017-06-06 15:26 GMT+03:00 Vyacheslav Daradur <daradurvs@gmail.com>:
> >
> > > Anton,
> > >
> > > Of course, the solution does not affect on existing implementation. I
> > mean,
> > > there is no changes if user not use the annotation @BinaryCompression.
> > (no
> > > performance changes)
> > > Only if user make decision to use compression on specific field or
> fields
> > > of a class - in that case compression will be used at marshalling in
> > > relation to annotated fields.
> > >
> > > 2017-06-06 15:10 GMT+03:00 Антон Чураев <churaev.an@gmail.com>:
> > >
> > > > Vyacheslav,
> > > >
> > > > Is it possible to propose implementation that can be switched on
> > > on-demand?
> > > > In this case it should not affect performance of current solution.
> > > >
> > > > I mean, that users should make decision what is more important for
> > them:
> > > > throutput or memory/net usage.
> > > > May be they will be choose not all objects, or only some attributes
> of
> > > > objects for compress.
> > > >
> > > > 2017-06-06 14:48 GMT+03:00 Vyacheslav Daradur <daradurvs@gmail.com>:
> > > >
> > > > > Conclusion:
> > > > > Provided solution allows reduce size of an object in IgniteCache
at
> > the
> > > > > cost of throughput reduction (small - in some cases), it depends
on
> > > part
> > > > of
> > > > > object which will be compressed and compression algorithm.
> > > > > I mean, we can make more effective use of memory, and in some cases
> > it
> > > > can
> > > > > reduce loading of the interconnect. (replication, rebalancing)
> > > > >
> > > > > Especially, it will be particularly useful for object's fields
> which
> > > are
> > > > > large text (>~ 250 bytes) and can be effectively compressed.
> > > > >
> > > > > 2017-06-06 12:00 GMT+03:00 Антон Чураев <churaev.an@gmail.com>:
> > > > >
> > > > > > Vyacheslav, thank you! But could you please provide a conclusions
> > or
> > > > > > proposals based on this benchmarks?
> > > > > >
> > > > > > 2017-06-06 11:28 GMT+03:00 Vyacheslav Daradur <
> daradurvs@gmail.com
> > >:
> > > > > >
> > > > > > > Dmitry,
> > > > > > >
> > > > > > > Excel-pages:
> > > > > > >
> > > > > > > 1). "Compression ratio (2)" - shows object size, with
> compression
> > > and
> > > > > > > without compression. (Conditions: literal text)
> > > > > > > 1st graph shows compression ratios of using different
> compression
> > > > > > algrithms
> > > > > > > depending on size of compressed field.
> > > > > > > 2nd graph shows evaluation of size of objects depending
on
> sizes
> > > and
> > > > > > > compression algorithms.
> > > > > > >
> > > > > > > 2). "Compression ratio (1)" - shows object size, with
> compression
> > > and
> > > > > > > without compression. (Conditions:  badly compressed character
> > > > sequence)
> > > > > > > 1st graph shows compression ratios of using different
> compression
> > > > > > > algrithms depending on size of compressed field.
> > > > > > > 2nd graph shows evaluation of size of objects depending
on
> sizes
> > > and
> > > > > > > compression algorithms.
> > > > > > >
> > > > > > > 3) 'put-avg" - shows average time of the "put" operation
> > depending
> > > on
> > > > > > size
> > > > > > > and compression algorithms.
> > > > > > >
> > > > > > > 4) 'put-thrpt" - shows throughput of the "put" operation
> > depending
> > > on
> > > > > > size
> > > > > > > and compression algorithms.
> > > > > > >
> > > > > > > 5) 'get-avg" - shows average time of the "get" operation
> > depending
> > > on
> > > > > > size
> > > > > > > and compression algorithms.
> > > > > > >
> > > > > > > 6) 'get-thrpt" - shows throughput of the "get" operation
> > depending
> > > on
> > > > > > size
> > > > > > > and compression algorithms.
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > 2017-06-06 10:59 GMT+03:00 Dmitriy Setrakyan <
> > > dsetrakyan@apache.org
> > > > >:
> > > > > > >
> > > > > > > > Vladimir, I am not sure how to interpret the graphs?
What are
> > we
> > > > > > looking
> > > > > > > > at?
> > > > > > > >
> > > > > > > > On Tue, Jun 6, 2017 at 12:33 AM, Vyacheslav Daradur
<
> > > > > > daradurvs@gmail.com
> > > > > > > >
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Hi, Igniters.
> > > > > > > > >
> > > > > > > > > I've prepared some benchmarking. Results [1].
> > > > > > > > >
> > > > > > > > > And I've prepared the evaluation in the form
of diagrams
> [2].
> > > > > > > > >
> > > > > > > > > I hope that helps to interest the community and
> accelerates a
> > > > > > reaction
> > > > > > > to
> > > > > > > > > this improvment :)
> > > > > > > > >
> > > > > > > > > [1]
> > > > > > > > > https://github.com/daradurvs/ignite-compression/tree/
> > > > > > > > > master/src/main/resources/result
> > > > > > > > > [2] https://drive.google.com/file/d/
> > > > 0B2CeUAOgrHkoMklyZ25YTEdKcEk/
> > > > > > view
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > 2017-05-24 9:49 GMT+03:00 Vyacheslav Daradur
<
> > > > daradurvs@gmail.com
> > > > > >:
> > > > > > > > >
> > > > > > > > > > Guys, any thoughts?
> > > > > > > > > >
> > > > > > > > > > 2017-05-16 13:40 GMT+03:00 Vyacheslav Daradur
<
> > > > > daradurvs@gmail.com
> > > > > > >:
> > > > > > > > > >
> > > > > > > > > >> Hi guys,
> > > > > > > > > >>
> > > > > > > > > >> I've prepared the PR to show my idea.
> > > > > > > > > >> https://github.com/apache/ignite/pull/1951/files
> > > > > > > > > >>
> > > > > > > > > >> About querying - I've just copied existing
tests and
> have
> > > > > > annotated
> > > > > > > > the
> > > > > > > > > >> testing data.
> > > > > > > > > >> https://github.com/apache/ignite/pull/1951/files#diff-
> > > c19a9d
> > > > > > > > > >> f4058141d059bb577e75244764
> > > > > > > > > >>
> > > > > > > > > >> It means fields which will be marked
by
> @BinaryCompression
> > > > will
> > > > > be
> > > > > > > > > >> compressed at marshalling via BinaryMarshaller.
> > > > > > > > > >>
> > > > > > > > > >> This solution has no effect on existing
data or project
> > > > > > > architecture.
> > > > > > > > > >>
> > > > > > > > > >> I'll be glad to see your thougths.
> > > > > > > > > >>
> > > > > > > > > >>
> > > > > > > > > >> 2017-05-15 19:18 GMT+03:00 Vyacheslav
Daradur <
> > > > > > daradurvs@gmail.com
> > > > > > > >:
> > > > > > > > > >>
> > > > > > > > > >>> Dmitriy,
> > > > > > > > > >>>
> > > > > > > > > >>> I have ready prototype. I want to
show it.
> > > > > > > > > >>> It is always easier to discuss on
example.
> > > > > > > > > >>>
> > > > > > > > > >>> 2017-05-15 19:02 GMT+03:00 Dmitriy
Setrakyan <
> > > > > > > dsetrakyan@apache.org
> > > > > > > > >:
> > > > > > > > > >>>
> > > > > > > > > >>>> Vyacheslav,
> > > > > > > > > >>>>
> > > > > > > > > >>>> I think it is a bit premature
to provide a PR without
> > > > getting
> > > > > a
> > > > > > > > > >>>> community
> > > > > > > > > >>>> consensus on the dev list. Please
allow some time for
> > the
> > > > > > > community
> > > > > > > > to
> > > > > > > > > >>>> respond.
> > > > > > > > > >>>>
> > > > > > > > > >>>> D.
> > > > > > > > > >>>>
> > > > > > > > > >>>> On Mon, May 15, 2017 at 6:36
AM, Vyacheslav Daradur <
> > > > > > > > > >>>> daradurvs@gmail.com>
> > > > > > > > > >>>> wrote:
> > > > > > > > > >>>>
> > > > > > > > > >>>> > I created the ticket:
> https://issues.apache.org/jira
> > > > > > > > > >>>> /browse/IGNITE-5226
> > > > > > > > > >>>> >
> > > > > > > > > >>>> > I'll prepare a PR with
described solution in couple
> of
> > > > days.
> > > > > > > > > >>>> >
> > > > > > > > > >>>> > 2017-05-15 15:05 GMT+03:00
Vyacheslav Daradur <
> > > > > > > > daradurvs@gmail.com
> > > > > > > > > >:
> > > > > > > > > >>>> >
> > > > > > > > > >>>> > > Hi, Igniters!
> > > > > > > > > >>>> > >
> > > > > > > > > >>>> > > Apache 2.0 is released.
> > > > > > > > > >>>> > >
> > > > > > > > > >>>> > > Let's continue the
discussion about a compression
> > > > design.
> > > > > > > > > >>>> > >
> > > > > > > > > >>>> > > At the moment, I found
only one solution which is
> > > > > compatible
> > > > > > > > with
> > > > > > > > > >>>> > querying
> > > > > > > > > >>>> > > and indexing, this
is per-objects-field
> compression.
> > > > > > > > > >>>> > > Per-fields compression
means that metadata (a
> > header)
> > > of
> > > > > an
> > > > > > > > object
> > > > > > > > > >>>> won't
> > > > > > > > > >>>> > > be compressed, only
serialized values of an object
> > > > fields
> > > > > > (in
> > > > > > > > > bytes
> > > > > > > > > >>>> array
> > > > > > > > > >>>> > > form) will be compressed.
> > > > > > > > > >>>> > >
> > > > > > > > > >>>> > > This solution have
some contentious issues:
> > > > > > > > > >>>> > > - small values, like
primitives and short arrays -
> > > there
> > > > > > isn't
> > > > > > > > > >>>> sense to
> > > > > > > > > >>>> > > compress them;
> > > > > > > > > >>>> > > - there is no possible
to use compression with
> > > > > > java-predefined
> > > > > > > > > >>>> types;
> > > > > > > > > >>>> > >
> > > > > > > > > >>>> > > We can provide an
annotation, @IgniteCompression -
> > for
> > > > > > > example,
> > > > > > > > > >>>> which can
> > > > > > > > > >>>> > > be used by users for
marking fields to compress.
> > > > > > > > > >>>> > >
> > > > > > > > > >>>> > > Any thoughts?
> > > > > > > > > >>>> > >
> > > > > > > > > >>>> > > Maybe someone already
have ready design?
> > > > > > > > > >>>> > >
> > > > > > > > > >>>> > > 2017-04-10 11:06 GMT+03:00
Vyacheslav Daradur <
> > > > > > > > > daradurvs@gmail.com
> > > > > > > > > >>>> >:
> > > > > > > > > >>>> > >
> > > > > > > > > >>>> > >> Alexey,
> > > > > > > > > >>>> > >>
> > > > > > > > > >>>> > >> Yes, I've read
it.
> > > > > > > > > >>>> > >>
> > > > > > > > > >>>> > >> Ok, let's discuss
about public API design.
> > > > > > > > > >>>> > >>
> > > > > > > > > >>>> > >> I think we need
to add some a configure entity to
> > > > > > > > > >>>> CacheConfiguration,
> > > > > > > > > >>>> > >> which will contain
the Compressor interface
> > > > > implementation
> > > > > > > and
> > > > > > > > > some
> > > > > > > > > >>>> > usefull
> > > > > > > > > >>>> > >> parameters.
> > > > > > > > > >>>> > >> Or maybe to provide
a BinaryMarshaller decorator,
> > > which
> > > > > > will
> > > > > > > be
> > > > > > > > > >>>> compress
> > > > > > > > > >>>> > >> data after marshalling.
> > > > > > > > > >>>> > >>
> > > > > > > > > >>>> > >>
> > > > > > > > > >>>> > >> 2017-04-10 10:40
GMT+03:00 Alexey Kuznetsov <
> > > > > > > > > akuznetsov@apache.org
> > > > > > > > > >>>> >:
> > > > > > > > > >>>> > >>
> > > > > > > > > >>>> > >>> Vyacheslav,
> > > > > > > > > >>>> > >>>
> > > > > > > > > >>>> > >>> Did you read
initial discussion [1] about
> > > compression?
> > > > > > > > > >>>> > >>> As far as
I remember we agreed to add only some
> > > > > > "top-level"
> > > > > > > > API
> > > > > > > > > in
> > > > > > > > > >>>> > order
> > > > > > > > > >>>> > >>> to
> > > > > > > > > >>>> > >>> provide a
way for
> > > > > > > > > >>>> > >>> Ignite users
to inject some sort of custom
> > > > compression.
> > > > > > > > > >>>> > >>>
> > > > > > > > > >>>> > >>>
> > > > > > > > > >>>> > >>> [1]
> > > > > > > > > >>>> > >>> http://apache-ignite-developer
> s.2346864.n4.nabble
> > .
> > > > > > > com/Data-c
> > > > > > > > > >>>> > >>> ompression-in-Ignite-2-0-td10099.html
> > > > > > > > > >>>> > >>>
> > > > > > > > > >>>> > >>> On Mon, Apr
10, 2017 at 2:19 PM, daradurvs <
> > > > > > > > daradurvs@gmail.com
> > > > > > > > > >
> > > > > > > > > >>>> > wrote:
> > > > > > > > > >>>> > >>>
> > > > > > > > > >>>> > >>> > Hi Igniters!
> > > > > > > > > >>>> > >>> >
> > > > > > > > > >>>> > >>> > I am
interested in this task.
> > > > > > > > > >>>> > >>> > Provide
some kind of pluggable compression SPI
> > > > support
> > > > > > > > > >>>> > >>> > <https://issues.apache.org/
> > > jira/browse/IGNITE-3592>
> > > > > > > > > >>>> > >>> >
> > > > > > > > > >>>> > >>> > I developed
a solution on
> > BinaryMarshaller-level,
> > > > but
> > > > > > > > reviewer
> > > > > > > > > >>>> has
> > > > > > > > > >>>> > >>> rejected
> > > > > > > > > >>>> > >>> > it.
> > > > > > > > > >>>> > >>> >
> > > > > > > > > >>>> > >>> > Let's
continue discussion of task goals and
> > > solution
> > > > > > > design.
> > > > > > > > > >>>> > >>> > As I
understood that, the main goal of this
> task
> > > is
> > > > to
> > > > > > > store
> > > > > > > > > >>>> data in
> > > > > > > > > >>>> > >>> > compressed
form.
> > > > > > > > > >>>> > >>> > This
is what I need from Ignite as its user.
> > > > > Compression
> > > > > > > > > >>>> provides
> > > > > > > > > >>>> > >>> economy
> > > > > > > > > >>>> > >>> > on
> > > > > > > > > >>>> > >>> > servers.
> > > > > > > > > >>>> > >>> > We can
store more data on same servers at the
> > cost
> > > > of
> > > > > > > > > >>>> increasing CPU
> > > > > > > > > >>>> > >>> > utilization.
> > > > > > > > > >>>> > >>> >
> > > > > > > > > >>>> > >>> > I'm researching
a possibility of
> implementation
> > of
> > > > > > > > compression
> > > > > > > > > >>>> at the
> > > > > > > > > >>>> > >>> > cache-level.
> > > > > > > > > >>>> > >>> >
> > > > > > > > > >>>> > >>> > Any thoughts?
> > > > > > > > > >>>> > >>> >
> > > > > > > > > >>>> > >>> > --
> > > > > > > > > >>>> > >>> > Best
regards,
> > > > > > > > > >>>> > >>> > Vyacheslav
> > > > > > > > > >>>> > >>> >
> > > > > > > > > >>>> > >>> >
> > > > > > > > > >>>> > >>> >
> > > > > > > > > >>>> > >>> >
> > > > > > > > > >>>> > >>> > --
> > > > > > > > > >>>> > >>> > View
this message in context:
> > > http://apache-ignite-
> > > > > > > > > >>>> > >>> > developers.2346864.n4.nabble.
> > > > com/Data-compression-in-
> > > > > > > > > >>>> > >>> > Ignite-2-0-tp10099p16317.html
> > > > > > > > > >>>> > >>> > Sent
from the Apache Ignite Developers mailing
> > > list
> > > > > > > archive
> > > > > > > > at
> > > > > > > > > >>>> > >>> Nabble.com.
> > > > > > > > > >>>> > >>> >
> > > > > > > > > >>>> > >>>
> > > > > > > > > >>>> > >>>
> > > > > > > > > >>>> > >>>
> > > > > > > > > >>>> > >>> --
> > > > > > > > > >>>> > >>> Alexey Kuznetsov
> > > > > > > > > >>>> > >>>
> > > > > > > > > >>>> > >>
> > > > > > > > > >>>> > >>
> > > > > > > > > >>>> > >>
> > > > > > > > > >>>> > >> --
> > > > > > > > > >>>> > >> Best Regards,
Vyacheslav
> > > > > > > > > >>>> > >>
> > > > > > > > > >>>> > >
> > > > > > > > > >>>> > >
> > > > > > > > > >>>> > >
> > > > > > > > > >>>> > > --
> > > > > > > > > >>>> > > Best Regards, Vyacheslav
> > > > > > > > > >>>> > >
> > > > > > > > > >>>> >
> > > > > > > > > >>>> >
> > > > > > > > > >>>> >
> > > > > > > > > >>>> > --
> > > > > > > > > >>>> > Best Regards, Vyacheslav
> > > > > > > > > >>>> >
> > > > > > > > > >>>>
> > > > > > > > > >>>
> > > > > > > > > >>>
> > > > > > > > > >>>
> > > > > > > > > >>> --
> > > > > > > > > >>> Best Regards, Vyacheslav
> > > > > > > > > >>>
> > > > > > > > > >>
> > > > > > > > > >>
> > > > > > > > > >>
> > > > > > > > > >> --
> > > > > > > > > >> Best Regards, Vyacheslav
> > > > > > > > > >>
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > --
> > > > > > > > > > Best Regards, Vyacheslav
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > --
> > > > > > > > > Best Regards, Vyacheslav
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > Best Regards, Vyacheslav
> > > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > >
> > > > > > Best Regards, Anton Churaev
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Best Regards, Vyacheslav
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > >
> > > > Best Regards, Anton Churaev
> > > >
> > >
> > >
> > >
> > > --
> > > Best Regards, Vyacheslav
> > >
> >
> >
> >
> > --
> >
> > Best Regards, Anton Churaev
> >
>
>
>
> --
> Best Regards, Vyacheslav
>



-- 

Best Regards, Anton Churaev

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message