From dev-return-36238-archive-asf-public=cust-asf.ponee.io@ignite.apache.org Thu Jul 5 18:02:18 2018 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx-eu-01.ponee.io (Postfix) with SMTP id 27BAF180674 for ; Thu, 5 Jul 2018 18:02:17 +0200 (CEST) Received: (qmail 21919 invoked by uid 500); 5 Jul 2018 16:02:17 -0000 Mailing-List: contact dev-help@ignite.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@ignite.apache.org Delivered-To: mailing list dev@ignite.apache.org Received: (qmail 21887 invoked by uid 99); 5 Jul 2018 16:02:16 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 05 Jul 2018 16:02:16 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id 0AFA6C00CB for ; Thu, 5 Jul 2018 16:02:16 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.969 X-Spam-Level: * X-Spam-Status: No, score=1.969 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001, T_DKIMWL_WL_MED=-0.01] autolearn=disabled Authentication-Results: spamd4-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gridgain-com.20150623.gappssmtp.com Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id sgSg6SZI-Btd for ; Thu, 5 Jul 2018 16:02:12 +0000 (UTC) Received: from mail-ua0-f170.google.com (mail-ua0-f170.google.com [209.85.217.170]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id B3E355F5FC for ; Thu, 5 Jul 2018 16:02:11 +0000 (UTC) Received: by mail-ua0-f170.google.com with SMTP id r18-v6so5721964ual.13 for ; Thu, 05 Jul 2018 09:02:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gridgain-com.20150623.gappssmtp.com; s=20150623; h=mime-version:references:in-reply-to:from:date:message-id:subject:to; bh=s3YVmDbbtD8zoCqRoO2PqSfKrbyY7zIGH2G3RIVD6pE=; b=YsEcbZmUZOsvZY2/W7HaYc4jV8yF5R358T0o7Ov4f/MSQioAu8HHVvT1H2KxqglpFj JGKuYsdTSsQUrlCLbWBuqzTTg1SrRbA1mVp6Hm0dmlJQ9WdH1EicPv0B63QGOYLl03qz yozXTBFJIaEEYpbBmRpV0X4U2Mt+4D7fsWxaDaLPIU3vVjRxasbd0slWQwyiHBa20+Yi WmMA149btYj5nyduNkMKMsWalTuH6lq7WwqIioo/UPYco9nbG08aeiqhejD7PAKwFt6i mDSZSo0qfK471YRe6Rf45FtNrRC1lE7b3FZkwOYP+PwLe19p264cG+kGYvjdYGpFXW5V /M9A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to; bh=s3YVmDbbtD8zoCqRoO2PqSfKrbyY7zIGH2G3RIVD6pE=; b=Aq+Jb/8L0+mhf9hv73AA4PdH4dmVtfbGbKXwP1fOCW2OJrgsf3vj5vFpEkV+Lg+YuO SveI8PiTidqddjCvogkcMp5o+9kiYt+xLpjmCYiCbTQ+0nI3NIJWQcNQB7ujwYVQHyVp qg2x1hD9QcOR77L+6bU2AZtniMI43Ivji1EpRTunT9Lx08QXoJIgthDRydzHcgBJ0ax7 NcrIbMb9RSJ1T8tgJko3TGtM5MoQekMQ5gyFdxUJwwBrZO//uaD9hz5Ol9FaerijM2Ns QMoQ4mG69OTZF0g8NA0+LejuoPqOQOWmA/OH1cpQpRKpYCnpWWoa+y+Hq101PyT+dcav QSxQ== X-Gm-Message-State: APt69E2RBr+ESBMMQyl1VkevLDP+zuugajjMSRmK1syU2J7Z/mlZLSXq xNcBn/9ZGZ1p2iH5M8M1HjROf/Y7qXn2Yq0PoxrvxQ== X-Google-Smtp-Source: AAOMgpcF5bycO2rZlQHVNT4CYaFdf2ySEN5nOTbVKtX1/98J0wWevdW2mBnMguGBGgOTMnnq2RWtELSQg2t0rYig+zA= X-Received: by 2002:ab0:b12:: with SMTP id b18-v6mr4192131uak.130.1530806524649; Thu, 05 Jul 2018 09:02:04 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Vladimir Ozerov Date: Thu, 5 Jul 2018 19:01:54 +0300 Message-ID: Subject: Re: Ignite as distributed file storage To: dev Content-Type: multipart/alternative; boundary="0000000000001f4314057042abe4" --0000000000001f4314057042abe4 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Pavel, Design you described is almost precisely what IGFS does. It has a cache for metadata, split binary data in chunks with intelligent affinity routing. In addition we have map-reduce feature on top of it and integration with underlying file system with optional caching. Data can be accessed in blocks or streams. IGFS is not in active development, but it is not outdated either. Can you shortly explain why do you think that we need to drop IGFS and re-implement almost the same thing from scratch? Dima, Sergey, Yes, we need BLOB support you described. Unfortunately it is not that easy to implement from SQL perspective. To support it we would need either MVCC (with it's own drawbacks) or read-locks for SELECT. Vladimir. On Tue, Jul 3, 2018 at 10:40 AM Sergey Kozlov wrote: > Dmitriy > > You're right that that large objects storing should be optmized. > > Let's assume the large object means the regular object having large field= s > and such fileds won't be used for comparison thus we can do not restore t= he > BLOB fields in offheap page memory e.g for sql queries if select doesn't > include them explicitly. It can reduce page eviction and speed up the > perfomance and make less chance to get OOM. > > > > On Tue, Jul 3, 2018 at 1:06 AM, Dmitriy Setrakyan > wrote: > > > To be honest, I am not sure if we need to kick off another file system > > storage discussion in Ignite. It sounds like a huge effort and likely > will > > not be productive. > > > > However, I think an ability to store large objects will make sense. For > > example, how do I store a 10GB blob in Ignite cache? Most likely we hav= e > to > > have a separate memory or disk space, allocated for blobs only. We also > > need to be able to efficiently transfer a 10GB Blob object over the > network > > and store it off-heap right away, without bringing it into main heap > memory > > (otherwise we would run out of memory). > > > > I suggest that we create an IEP about this use case alone and leave the > > file system for the future discussions. > > > > D. > > > > On Mon, Jul 2, 2018 at 6:50 AM, Vladimir Ozerov > > wrote: > > > > > Pavel, > > > > > > Thank you. I'll wait for feature comparison and concrete use cases, > > because > > > for me this feature still sounds too abstract to judge whether produc= t > > > would benefit from it. > > > > > > On Mon, Jul 2, 2018 at 3:15 PM Pavel Kovalenko > > wrote: > > > > > > > Dmitriy, > > > > > > > > I think we have a little miscommunication here. Of course, I meant > > > > supporting large entries / chunks of binary data. Internally it wil= l > be > > > > BLOB storage, which can be accessed through various interfaces. > > > > "File" is just an abstraction for an end user for convenience, a > > wrapper > > > > layer to have user-friendly API to directly store BLOBs. We shouldn= 't > > > > support full file protocol support with file system capabilities. I= t > > can > > > be > > > > added later, but now it's absolutely unnecessary and introduces ext= ra > > > > complexity. > > > > > > > > We can implement our BLOB storage step by step. The first thing is > > > > core functionality and support to save large parts of binary object= s > to > > > it. > > > > "File" layer, Web layer, etc. can be added later. > > > > > > > > The initial IGFS design doesn't have good capabilities to have a > > > > persistence layer. I think we shouldn't do any changes to it, this > > > project > > > > as for me is almost outdated. We will drop IGFS after implementing > File > > > > System layer over our BLOB storage. > > > > > > > > Vladimir, > > > > > > > > I will prepare a comparison with other existing distributed file > > storages > > > > and file systems in a few days. > > > > > > > > About usage data grid, I never said, that we need transactions, syn= c > > > backup > > > > and etc. We need just a few core things - Atomic cache with > > persistence, > > > > Discovery, Baseline, Affinity, and Communication. > > > > Other things we can implement by ourselves. So this feature can > develop > > > > independently of other non-core features. > > > > For me Ignite way is providing to our users a fast and convenient w= ay > > to > > > > solve their problems with good performance and durability. We have > the > > > > problem with storing large data, we should solve it. > > > > About other things see my message to Dmitriy above. > > > > > > > > =D0=B2=D1=81, 1 =D0=B8=D1=8E=D0=BB. 2018 =D0=B3. =D0=B2 9:48, Dmitr= iy Setrakyan >: > > > > > > > > > Pavel, > > > > > > > > > > I have actually misunderstood the use case. To be honest, I thoug= ht > > > that > > > > > you were talking about the support of large values in Ignite > caches, > > > e.g. > > > > > objects that are several megabytes in cache. > > > > > > > > > > If we are tackling the distributed file system, then in my view, = we > > > > should > > > > > be talking about IGFS and adding persistence support to IGFS (whi= ch > > is > > > > > based on HDFS API). It is not clear to me that you are talking > about > > > > IGFS. > > > > > Can you confirm? > > > > > > > > > > D. > > > > > > > > > > > > > > > On Sat, Jun 30, 2018 at 10:59 AM, Pavel Kovalenko < > > jokserfn@gmail.com> > > > > > wrote: > > > > > > > > > > > Dmitriy, > > > > > > > > > > > > Yes, I have approximate design in my mind. The main idea is tha= t > we > > > > > already > > > > > > have distributed cache for files metadata (our Atomic cache), t= he > > > data > > > > > flow > > > > > > and distribution will be controlled by our AffinityFunction and > > > > Baseline. > > > > > > We're already have discovery and communication to make such loc= al > > > files > > > > > > storages to be synced. The files data will be separated to larg= e > > > blocks > > > > > > (64-128Mb) (which looks very similar to our WAL). Each block ca= n > > > > contain > > > > > > one or more file chunks. The tablespace (segment ids, offsets a= nd > > > etc.) > > > > > > will be stored to our regular page memory. This is key ideas to > > > > implement > > > > > > first version of such storage. We already have similiar > components > > in > > > > our > > > > > > persistence, so this experience can be reused to develop such > > > storage. > > > > > > > > > > > > Denis, > > > > > > > > > > > > Nothing significant should be changed at our memory level. It > will > > be > > > > > > separate, pluggable component over cache. Most of the functions > > which > > > > > give > > > > > > performance boost can be delegated to OS level (Memory mapped > > files, > > > > DMA, > > > > > > Direct write from Socket to disk and vice versa). Ignite and Fi= le > > > > Storage > > > > > > can develop independetly of each other. > > > > > > > > > > > > Alexey Stelmak, which has a great experience with developing su= ch > > > > systems > > > > > > can provide more low level information about how it should look= . > > > > > > > > > > > > =D1=81=D0=B1, 30 =D0=B8=D1=8E=D0=BD. 2018 =D0=B3. =D0=B2 19:40,= Dmitriy Setrakyan < > > > dsetrakyan@apache.org > > > > >: > > > > > > > > > > > > > Pavel, it definitely makes sense. Do you have a design in min= d? > > > > > > > > > > > > > > D. > > > > > > > > > > > > > > On Sat, Jun 30, 2018, 07:24 Pavel Kovalenko < > jokserfn@gmail.com> > > > > > wrote: > > > > > > > > > > > > > > > Igniters, > > > > > > > > > > > > > > > > I would like to start a discussion about designing a new > > feature > > > > > > because > > > > > > > I > > > > > > > > think it's time to start making steps towards it. > > > > > > > > I noticed, that some of our users have tried to store large > > > > > homogenous > > > > > > > > entries (> 1, 10, 100 Mb/Gb/Tb) to our caches, but without > big > > > > > success. > > > > > > > > > > > > > > > > IGFS project has the possibility to do it, but as for me it > has > > > one > > > > > big > > > > > > > > disadvantage - it's in-memory only, so users have a strict > size > > > > limit > > > > > > of > > > > > > > > their data and have data loss problem. > > > > > > > > > > > > > > > > Our durable memory has a possibility to persist a data that > > > doesn't > > > > > fit > > > > > > > to > > > > > > > > RAM to disk, but page structure of it is not supposed to > store > > > > large > > > > > > > pieces > > > > > > > > of data. > > > > > > > > > > > > > > > > There are a lot of projects of distributed file systems lik= e > > > HDFS, > > > > > > > > GlusterFS, etc. But all of them concentrate to implement > > > high-grade > > > > > > file > > > > > > > > protocol, rather than user-friendly API which leads to high > > entry > > > > > > > threshold > > > > > > > > to start implementing something over it. > > > > > > > > We shouldn't go in this way. Our main goal should be > providing > > to > > > > > user > > > > > > > easy > > > > > > > > and fast way to use file storage and processing here and no= w. > > > > > > > > > > > > > > > > If take HDFS as closest possible by functionality project, = we > > > have > > > > > one > > > > > > > big > > > > > > > > advantage against it. We can use our caches as files metada= ta > > > > storage > > > > > > and > > > > > > > > have the infinite possibility to scale it, while HDFS is > > bounded > > > by > > > > > > > > Namenode capacity and has big problems with keeping a large > > > number > > > > of > > > > > > > files > > > > > > > > in the system. > > > > > > > > > > > > > > > > We achieved very good experience with persistence when we > > > developed > > > > > our > > > > > > > > durable memory, and we can couple together it and experienc= e > > with > > > > > > > services, > > > > > > > > binary protocol, I/O and start to design a new IEP. > > > > > > > > > > > > > > > > Use cases and features of the project: > > > > > > > > 1) Storing XML, JSON, BLOB, CLOB, images, videos, text, etc > > > without > > > > > > > > overhead and data loss possibility. > > > > > > > > 2) Easy, pluggable, fast and distributed file processing, > > > > > > transformation > > > > > > > > and analysis. (E.g. ImageMagick processor for images > > > > transformation, > > > > > > > > LuceneIndex for texts, whatever, it's bounded only by your > > > > > > imagination). > > > > > > > > 3) Scalability out of the box. > > > > > > > > 4) User-friendly API and minimal steps to start using this > > > storage > > > > in > > > > > > > > production. > > > > > > > > > > > > > > > > I repeated again, this project is not supposed to be a > > high-grade > > > > > > > > distributed file system with full file protocol support. > > > > > > > > This project should primarily focus on target users, which > > would > > > > like > > > > > > to > > > > > > > > use it without complex preparation. > > > > > > > > > > > > > > > > As for example, a user can deploy Ignite with such storage > and > > > > > > web-server > > > > > > > > with REST API as Ignite service and get scalable, performan= t > > > image > > > > > > server > > > > > > > > out of the box which can be accessed using any programming > > > > language. > > > > > > > > > > > > > > > > As a far target goal, we should focus on storing and > > processing a > > > > > very > > > > > > > > large amount of the data like movies, streaming, which is t= he > > big > > > > > trend > > > > > > > > today. > > > > > > > > > > > > > > > > I would like to say special thanks to our community members > > > Alexey > > > > > > > Stelmak > > > > > > > > and Dmitriy Govorukhin which significantly helped me to put > > > > together > > > > > > all > > > > > > > > pieces of that puzzle. > > > > > > > > > > > > > > > > So, I want to hear your opinions about this proposal. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > Sergey Kozlov > GridGain Systems > www.gridgain.com > --0000000000001f4314057042abe4--