ignite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vladimir Ozerov <voze...@gridgain.com>
Subject Re: Ignite as distributed file storage
Date Fri, 06 Jul 2018 07:34:38 GMT
Pavel,

I do not think it is a good idea to delay discussions and decisions.
Because it puts your efforts at risk being not accepted by community in the
end. Our ultimate goal is not having as much features as possible, but to
have a consistent product which is easy to understand and use. Having both
IGFS and another one "not-IGFS" which is in fact the same IGFS but with
different name is not a good idea, because it would cause more harm than
value.

Approaches which seems reasonable to me:
1) Integrate your ideas into IGFS, which is really flexible in how to
process data and where to store it. PROXY mode is not a "crutch" as you
said, but a normal mode which was used in real deployments.
2) Replace IGFS with your solution but with clear explanation how it is
better than IGFS and why we need to drop thousands lines of battle-tested
code with something new, what does virtually the same thing
3) Just drop IGFS from the product, and do not implement any replacement at
all - personally, I am all for this decision.

If you want I can guide you through IGFS architecture so that we better
understand what should be done to integrate your ideas into it.

Lat, but not least - we need objective facts why proposed solution is
better in terms of performance - concrete use cases and performance numbers
(or at least estimations).

On Fri, Jul 6, 2018 at 1:45 AM Pavel Kovalenko <jokserfn@gmail.com> wrote:

> Vladimir,
>
> I just want to add to my words, that we can implement BLOB storage and
> then, if community really wants it, we can adapt this storage to use as
> underlying file system in IGFS. But IGFS shouldn't be entry point for BLOB
> storage. I think this conclusion can satisfy both of us.
>
> 2018-07-06 0:47 GMT+03:00 Pavel Kovalenko <jokserfn@gmail.com>:
>
> > Vladimir,
> >
> > I didn't say that it stores data in on-heap, I said that it performs a
> lot
> > of operations with byte[] arrays in on-heap as I see in , which will lead
> > to frequent GCs and unnecessary data copying.
> > "But the whole idea around mmap sounds like premature optimisation to me"
> > - this is not premature optimisation, this is on of the key performance
> > features. E.g. Apache Kafka wouldn't be so fast and extremely performant
> > without zero-copy.
> > If we can do better, why not just do it? Especially if it costs nothing
> > for us (This is OS level).
> > As I said in my first message, our end target is handling video and
> > streaming, copying every chunk of it to on-heap userspace then to offheap
> > and then to disk is unacceptable.
> > You ask me to implement almost anything using just IGFS interface, why we
> > need to do that? Proxy mode looks like crutch, to support replication and
> > possibility to have some data in-memory I need to write again a lot of
> > stuff.
> > Let's finally leave IGFS alone and wait for IEP.
> >
> >
> > 2018-07-06 0:01 GMT+03:00 Vladimir Ozerov <vozerov@gridgain.com>:
> >
> >> Pavel,
> >>
> >> IGFS doesn't enforce you to have block in heap. What you suggest can be
> >> achieved with IGFS as follows:
> >> 1) Disable caching, so data cache is not used ("PROXY" mode)
> >> 2) Implement IgniteFileSystem interface which operates on abstract
> streams
> >>
> >> But the whole idea around mmap sounds like premature optimisation to
> me. I
> >> conducted a number of experiments with IGFS on large Hadoop workload.
> Even
> >> with old AI 1.x architecture, where everything was stored onheap, I
> never
> >> had an issue with GC. The key point is that IGFS operates on large
> (64Kb)
> >> data blocks, so even with 100Gb full of these blocks you will have
> >> relatively small number of objects and normal GC pauses. Additional
> memory
> >> copying is not an issue either in most workloads in distributed systems,
> >> because most of the time is spent on IO and internal synchronization
> >> anyway.
> >>
> >> Do you have specific scenario when you observed long GC pauses with GC
> or
> >> serious performance degradation with IGFS?
> >>
> >> Even if we agree that mmap usage is a critical piece, all we need is to
> >> implement a single IGFS interface.
> >>
> >> On Thu, Jul 5, 2018 at 10:44 PM Pavel Kovalenko <jokserfn@gmail.com>
> >> wrote:
> >>
> >> > Vladimir,
> >> >
> >> > The key difference between BLOB storage and IGFS is that BLOB storage
> >> will
> >> > have persistent-based architecture with possibility to cache blocks in
> >> > offheap (using mmap, which is more simple, because we delegate it to
> OS
> >> > level)
> >> > , while IGFS has in-memory based architecture with possibility to
> >> persist
> >> > blocks.
> >> > BLOB storage will have possibility to work with small amount of RAM
> >> without
> >> > signficant performance drop (Using zero-copy from socket to disk) and
> in
> >> > opposite case it can keep all available blocks in offheap if it's
> >> possible
> >> > (Using mmap again).
> >> > IGFS perform a lot of operations with blocks in on-heap which leads to
> >> > unnecessary data copies, long GC pauses and performance drop. All IGFS
> >> > architecture tightly bound with in-memory features, so it's too hard
> to
> >> > rewrite IGFS in persistent-based manner. But, cool IGFS features such
> as
> >> > intelligent affinity routing, chunk colocation will be reused in BLOB
> >> > storage.
> >> > Does it make sense?
> >> >
> >> >
> >> >
> >> > 2018-07-05 19:01 GMT+03:00 Vladimir Ozerov <vozerov@gridgain.com>:
> >> >
> >> > > Pavel,
> >> > > Design you described is almost precisely what IGFS does. It has a
> >> cache
> >> > for
> >> > > metadata, split binary data in chunks with intelligent affinity
> >> routing.
> >> > In
> >> > > addition we have map-reduce feature on top of it and integration
> with
> >> > > underlying file system with optional caching. Data can be accessed
> in
> >> > > blocks or streams. IGFS is not in active development, but it is not
> >> > > outdated either.
> >> > > Can you shortly explain why do you think that we need to drop IGFS
> and
> >> > > re-implement almost the same thing from scratch?
> >> > >
> >> > > Dima, Sergey,
> >> > > Yes, we need BLOB support you described. Unfortunately it is not
> that
> >> > easy
> >> > > to implement from SQL perspective. To support it we would need
> either
> >> > MVCC
> >> > > (with it's own drawbacks) or read-locks for SELECT.
> >> > >
> >> > > Vladimir.
> >> > >
> >> > > On Tue, Jul 3, 2018 at 10:40 AM Sergey Kozlov <skozlov@gridgain.com
> >
> >> > > wrote:
> >> > >
> >> > > > Dmitriy
> >> > > >
> >> > > > You're right that that large objects storing should be optmized.
> >> > > >
> >> > > > Let's assume the large object means the regular object having
> large
> >> > > fields
> >> > > > and such fileds won't be used for comparison thus we can do not
> >> restore
> >> > > the
> >> > > > BLOB fields in offheap page memory e.g for sql queries if select
> >> > doesn't
> >> > > > include them explicitly. It can reduce page eviction and speed
up
> >> the
> >> > > > perfomance and make less chance to get OOM.
> >> > > >
> >> > > >
> >> > > >
> >> > > > On Tue, Jul 3, 2018 at 1:06 AM, Dmitriy Setrakyan <
> >> > dsetrakyan@apache.org
> >> > > >
> >> > > > wrote:
> >> > > >
> >> > > > > To be honest, I am not sure if we need to kick off another
file
> >> > system
> >> > > > > storage discussion in Ignite. It sounds like a huge effort
and
> >> likely
> >> > > > will
> >> > > > > not be productive.
> >> > > > >
> >> > > > > However, I think an ability to store large objects will
make
> >> sense.
> >> > For
> >> > > > > example, how do I store a 10GB blob in Ignite cache? Most
likely
> >> we
> >> > > have
> >> > > > to
> >> > > > > have a separate memory or disk space, allocated for blobs
only.
> We
> >> > also
> >> > > > > need to be able to efficiently transfer a 10GB Blob object
over
> >> the
> >> > > > network
> >> > > > > and store it off-heap right away, without bringing it into
main
> >> heap
> >> > > > memory
> >> > > > > (otherwise we would run out of memory).
> >> > > > >
> >> > > > > I suggest that we create an IEP about this use case alone
and
> >> leave
> >> > the
> >> > > > > file system for the future discussions.
> >> > > > >
> >> > > > > D.
> >> > > > >
> >> > > > > On Mon, Jul 2, 2018 at 6:50 AM, Vladimir Ozerov <
> >> > vozerov@gridgain.com>
> >> > > > > wrote:
> >> > > > >
> >> > > > > > Pavel,
> >> > > > > >
> >> > > > > > Thank you. I'll wait for feature comparison and concrete
use
> >> cases,
> >> > > > > because
> >> > > > > > for me this feature still sounds too abstract to judge
whether
> >> > > product
> >> > > > > > would benefit from it.
> >> > > > > >
> >> > > > > > On Mon, Jul 2, 2018 at 3:15 PM Pavel Kovalenko <
> >> jokserfn@gmail.com
> >> > >
> >> > > > > wrote:
> >> > > > > >
> >> > > > > > > Dmitriy,
> >> > > > > > >
> >> > > > > > > I think we have a little miscommunication here.
Of course, I
> >> > meant
> >> > > > > > > supporting large entries / chunks of binary data.
Internally
> >> it
> >> > > will
> >> > > > be
> >> > > > > > > BLOB storage, which can be accessed through various
> >> interfaces.
> >> > > > > > > "File" is just an abstraction for an end user
for
> >> convenience, a
> >> > > > > wrapper
> >> > > > > > > layer to have user-friendly API to directly store
BLOBs. We
> >> > > shouldn't
> >> > > > > > > support full file protocol support with file system
> >> capabilities.
> >> > > It
> >> > > > > can
> >> > > > > > be
> >> > > > > > > added later, but now it's absolutely unnecessary
and
> >> introduces
> >> > > extra
> >> > > > > > > complexity.
> >> > > > > > >
> >> > > > > > > We can implement our BLOB storage step by step.
The first
> >> thing
> >> > is
> >> > > > > > > core functionality and support to save large parts
of binary
> >> > > objects
> >> > > > to
> >> > > > > > it.
> >> > > > > > > "File" layer, Web layer, etc. can be added later.
> >> > > > > > >
> >> > > > > > > The initial IGFS design doesn't have good capabilities
to
> >> have a
> >> > > > > > > persistence layer. I think we shouldn't do any
changes to
> it,
> >> > this
> >> > > > > > project
> >> > > > > > > as for me is almost outdated. We will drop IGFS
after
> >> > implementing
> >> > > > File
> >> > > > > > > System layer over our BLOB storage.
> >> > > > > > >
> >> > > > > > > Vladimir,
> >> > > > > > >
> >> > > > > > > I will prepare a comparison with other existing
distributed
> >> file
> >> > > > > storages
> >> > > > > > > and file systems in a few days.
> >> > > > > > >
> >> > > > > > > About usage data grid, I never said, that we need
> >> transactions,
> >> > > sync
> >> > > > > > backup
> >> > > > > > > and etc. We need just a few core things - Atomic
cache with
> >> > > > > persistence,
> >> > > > > > > Discovery, Baseline, Affinity, and Communication.
> >> > > > > > > Other things we can implement by ourselves. So
this feature
> >> can
> >> > > > develop
> >> > > > > > > independently of other non-core features.
> >> > > > > > > For me Ignite way is providing to our users a
fast and
> >> convenient
> >> > > way
> >> > > > > to
> >> > > > > > > solve their problems with good performance and
durability.
> We
> >> > have
> >> > > > the
> >> > > > > > > problem with storing large data, we should solve
it.
> >> > > > > > > About other things see my message to Dmitriy above.
> >> > > > > > >
> >> > > > > > > вс, 1 июл. 2018 г. в 9:48, Dmitriy Setrakyan
<
> >> > > dsetrakyan@apache.org
> >> > > > >:
> >> > > > > > >
> >> > > > > > > > Pavel,
> >> > > > > > > >
> >> > > > > > > > I have actually misunderstood the use case.
To be honest,
> I
> >> > > thought
> >> > > > > > that
> >> > > > > > > > you were talking about the support of large
values in
> Ignite
> >> > > > caches,
> >> > > > > > e.g.
> >> > > > > > > > objects that are several megabytes in cache.
> >> > > > > > > >
> >> > > > > > > > If we are tackling the distributed file system,
then in my
> >> > view,
> >> > > we
> >> > > > > > > should
> >> > > > > > > > be talking about IGFS and adding persistence
support to
> IGFS
> >> > > (which
> >> > > > > is
> >> > > > > > > > based on HDFS API). It is not clear to me
that you are
> >> talking
> >> > > > about
> >> > > > > > > IGFS.
> >> > > > > > > > Can you confirm?
> >> > > > > > > >
> >> > > > > > > > D.
> >> > > > > > > >
> >> > > > > > > >
> >> > > > > > > > On Sat, Jun 30, 2018 at 10:59 AM, Pavel Kovalenko
<
> >> > > > > jokserfn@gmail.com>
> >> > > > > > > > wrote:
> >> > > > > > > >
> >> > > > > > > > > Dmitriy,
> >> > > > > > > > >
> >> > > > > > > > > Yes, I have approximate design in my
mind. The main idea
> >> is
> >> > > that
> >> > > > we
> >> > > > > > > > already
> >> > > > > > > > > have distributed cache for files metadata
(our Atomic
> >> cache),
> >> > > the
> >> > > > > > data
> >> > > > > > > > flow
> >> > > > > > > > > and distribution will be controlled
by our
> >> AffinityFunction
> >> > and
> >> > > > > > > Baseline.
> >> > > > > > > > > We're already have discovery and communication
to make
> >> such
> >> > > local
> >> > > > > > files
> >> > > > > > > > > storages to be synced. The files data
will be separated
> to
> >> > > large
> >> > > > > > blocks
> >> > > > > > > > > (64-128Mb) (which looks very similar
to our WAL). Each
> >> block
> >> > > can
> >> > > > > > > contain
> >> > > > > > > > > one or more file chunks. The tablespace
(segment ids,
> >> offsets
> >> > > and
> >> > > > > > etc.)
> >> > > > > > > > > will be stored to our regular page memory.
This is key
> >> ideas
> >> > to
> >> > > > > > > implement
> >> > > > > > > > > first version of such storage. We already
have similiar
> >> > > > components
> >> > > > > in
> >> > > > > > > our
> >> > > > > > > > > persistence, so this experience can
be reused to develop
> >> such
> >> > > > > > storage.
> >> > > > > > > > >
> >> > > > > > > > > Denis,
> >> > > > > > > > >
> >> > > > > > > > > Nothing significant should be changed
at our memory
> >> level. It
> >> > > > will
> >> > > > > be
> >> > > > > > > > > separate, pluggable component over cache.
Most of the
> >> > functions
> >> > > > > which
> >> > > > > > > > give
> >> > > > > > > > > performance boost can be delegated to
OS level (Memory
> >> mapped
> >> > > > > files,
> >> > > > > > > DMA,
> >> > > > > > > > > Direct write from Socket to disk and
vice versa). Ignite
> >> and
> >> > > File
> >> > > > > > > Storage
> >> > > > > > > > > can develop independetly of each other.
> >> > > > > > > > >
> >> > > > > > > > > Alexey Stelmak, which has a great experience
with
> >> developing
> >> > > such
> >> > > > > > > systems
> >> > > > > > > > > can provide more low level information
about how it
> should
> >> > > look.
> >> > > > > > > > >
> >> > > > > > > > > сб, 30 июн. 2018 г. в 19:40,
Dmitriy Setrakyan <
> >> > > > > > dsetrakyan@apache.org
> >> > > > > > > >:
> >> > > > > > > > >
> >> > > > > > > > > > Pavel, it definitely makes sense.
Do you have a design
> >> in
> >> > > mind?
> >> > > > > > > > > >
> >> > > > > > > > > > D.
> >> > > > > > > > > >
> >> > > > > > > > > > On Sat, Jun 30, 2018, 07:24 Pavel
Kovalenko <
> >> > > > jokserfn@gmail.com>
> >> > > > > > > > wrote:
> >> > > > > > > > > >
> >> > > > > > > > > > > Igniters,
> >> > > > > > > > > > >
> >> > > > > > > > > > > I would like to start a discussion
about designing a
> >> new
> >> > > > > feature
> >> > > > > > > > > because
> >> > > > > > > > > > I
> >> > > > > > > > > > > think it's time to start making
steps towards it.
> >> > > > > > > > > > > I noticed, that some of our
users have tried to
> store
> >> > large
> >> > > > > > > > homogenous
> >> > > > > > > > > > > entries (> 1, 10, 100 Mb/Gb/Tb)
to our caches, but
> >> > without
> >> > > > big
> >> > > > > > > > success.
> >> > > > > > > > > > >
> >> > > > > > > > > > > IGFS project has the possibility
to do it, but as
> for
> >> me
> >> > it
> >> > > > has
> >> > > > > > one
> >> > > > > > > > big
> >> > > > > > > > > > > disadvantage - it's in-memory
only, so users have a
> >> > strict
> >> > > > size
> >> > > > > > > limit
> >> > > > > > > > > of
> >> > > > > > > > > > > their data and have data loss
problem.
> >> > > > > > > > > > >
> >> > > > > > > > > > > Our durable memory has a possibility
to persist a
> data
> >> > that
> >> > > > > > doesn't
> >> > > > > > > > fit
> >> > > > > > > > > > to
> >> > > > > > > > > > > RAM to disk, but page structure
of it is not
> supposed
> >> to
> >> > > > store
> >> > > > > > > large
> >> > > > > > > > > > pieces
> >> > > > > > > > > > > of data.
> >> > > > > > > > > > >
> >> > > > > > > > > > > There are a lot of projects
of distributed file
> >> systems
> >> > > like
> >> > > > > > HDFS,
> >> > > > > > > > > > > GlusterFS, etc. But all of
them concentrate to
> >> implement
> >> > > > > > high-grade
> >> > > > > > > > > file
> >> > > > > > > > > > > protocol, rather than user-friendly
API which leads
> to
> >> > high
> >> > > > > entry
> >> > > > > > > > > > threshold
> >> > > > > > > > > > > to start implementing something
over it.
> >> > > > > > > > > > > We shouldn't go in this way.
Our main goal should be
> >> > > > providing
> >> > > > > to
> >> > > > > > > > user
> >> > > > > > > > > > easy
> >> > > > > > > > > > > and fast way to use file storage
and processing here
> >> and
> >> > > now.
> >> > > > > > > > > > >
> >> > > > > > > > > > > If take HDFS as closest possible
by functionality
> >> > project,
> >> > > we
> >> > > > > > have
> >> > > > > > > > one
> >> > > > > > > > > > big
> >> > > > > > > > > > > advantage against it. We can
use our caches as files
> >> > > metadata
> >> > > > > > > storage
> >> > > > > > > > > and
> >> > > > > > > > > > > have the infinite possibility
to scale it, while
> HDFS
> >> is
> >> > > > > bounded
> >> > > > > > by
> >> > > > > > > > > > > Namenode capacity and has
big problems with keeping
> a
> >> > large
> >> > > > > > number
> >> > > > > > > of
> >> > > > > > > > > > files
> >> > > > > > > > > > > in the system.
> >> > > > > > > > > > >
> >> > > > > > > > > > > We achieved very good experience
with persistence
> >> when we
> >> > > > > > developed
> >> > > > > > > > our
> >> > > > > > > > > > > durable memory, and we can
couple together it and
> >> > > experience
> >> > > > > with
> >> > > > > > > > > > services,
> >> > > > > > > > > > > binary protocol, I/O and start
to design a new IEP.
> >> > > > > > > > > > >
> >> > > > > > > > > > > Use cases and features of
the project:
> >> > > > > > > > > > > 1) Storing XML, JSON, BLOB,
CLOB, images, videos,
> >> text,
> >> > etc
> >> > > > > > without
> >> > > > > > > > > > > overhead and data loss possibility.
> >> > > > > > > > > > > 2) Easy, pluggable, fast and
distributed file
> >> processing,
> >> > > > > > > > > transformation
> >> > > > > > > > > > > and analysis. (E.g. ImageMagick
processor for images
> >> > > > > > > transformation,
> >> > > > > > > > > > > LuceneIndex for texts, whatever,
it's bounded only
> by
> >> > your
> >> > > > > > > > > imagination).
> >> > > > > > > > > > > 3) Scalability out of the
box.
> >> > > > > > > > > > > 4) User-friendly API and minimal
steps to start
> using
> >> > this
> >> > > > > > storage
> >> > > > > > > in
> >> > > > > > > > > > > production.
> >> > > > > > > > > > >
> >> > > > > > > > > > > I repeated again, this project
is not supposed to
> be a
> >> > > > > high-grade
> >> > > > > > > > > > > distributed file system with
full file protocol
> >> support.
> >> > > > > > > > > > > This project should primarily
focus on target users,
> >> > which
> >> > > > > would
> >> > > > > > > like
> >> > > > > > > > > to
> >> > > > > > > > > > > use it without complex preparation.
> >> > > > > > > > > > >
> >> > > > > > > > > > > As for example, a user can
deploy Ignite with such
> >> > storage
> >> > > > and
> >> > > > > > > > > web-server
> >> > > > > > > > > > > with REST API as Ignite service
and get scalable,
> >> > > performant
> >> > > > > > image
> >> > > > > > > > > server
> >> > > > > > > > > > > out of the box which can be
accessed using any
> >> > programming
> >> > > > > > > language.
> >> > > > > > > > > > >
> >> > > > > > > > > > > As a far target goal, we should
focus on storing and
> >> > > > > processing a
> >> > > > > > > > very
> >> > > > > > > > > > > large amount of the data like
movies, streaming,
> >> which is
> >> > > the
> >> > > > > big
> >> > > > > > > > trend
> >> > > > > > > > > > > today.
> >> > > > > > > > > > >
> >> > > > > > > > > > > I would like to say special
thanks to our community
> >> > members
> >> > > > > > Alexey
> >> > > > > > > > > > Stelmak
> >> > > > > > > > > > > and Dmitriy Govorukhin which
significantly helped me
> >> to
> >> > put
> >> > > > > > > together
> >> > > > > > > > > all
> >> > > > > > > > > > > pieces of that puzzle.
> >> > > > > > > > > > >
> >> > > > > > > > > > > So, I want to hear your opinions
about this
> proposal.
> >> > > > > > > > > > >
> >> > > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > > >
> >> > > >
> >> > > > --
> >> > > > Sergey Kozlov
> >> > > > GridGain Systems
> >> > > > www.gridgain.com
> >> > > >
> >> > >
> >> >
> >>
> >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message