Mailing-List: contact user-help@couchdb.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@couchdb.apache.org
Received-SPF: pass (athena.apache.org: local policy)
From: <justin@lisol.co.uk>
To: <user@couchdb.apache.org>
References: 
 <CACF7Wx1Lf9ByoNp2GhfzLZJw6ifqMEBnuCFfcCvwR8g03fhKSw@mail.gmail.com>
 <CA+w3y1h9YfGeWLfwLPR++aCmGFRYnsVMh7ZU1SSyuRUDPAMWUg@mail.gmail.com>
 <CACF7Wx3j6Ym9L4qYRu23O8nxSjrSQP9DGCynOYfoCC-tO_VErQ@mail.gmail.com>
 <CA+w3y1hKWW=hoURok0PoKapp=qnLFT3uUq2f1fkrtvJZhGR=8g@mail.gmail.com>
 <CA+w3y1jbG7MNktyiNZ06vodZZwRycpu=YvT5o==jCkvZsx6hYw@mail.gmail.com>
 <CACF7Wx2PkEZQRex2eORWr6C1-oMu4Qo5dkouLBaO4iYamBqyBg@mail.gmail.com>
In-Reply-To: 
 <CACF7Wx2PkEZQRex2eORWr6C1-oMu4Qo5dkouLBaO4iYamBqyBg@mail.gmail.com>
Subject: RE: Allow user-defined views
Date: Fri, 28 Nov 2014 06:50:10 -0000
Message-ID: <004501d00ad7$8db85a60$a9290f20$@co.uk>
MIME-Version: 1.0
Content-Type: text/plain;
	charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Thread-Index: AdAKsPoVRchCKndRSaGzl5Tx/Odj0QAJcUGA
Content-Language: en-gb

Hi Peter,

An interesting concept ...

This may sound simplistic, but is it viable for your application to =
initially have a process where new queries written are vetted by a human =
before they are run ? The advantage of this is two-fold, namely:
	i) you'll be able to move on a prove your concept quickly
	ii) while doing this, you may learn enough (and things may change =
enough) for you to automate the vetting process

Thanks,
Justin

-----Original Message-----
From: Peter Grman [mailto:peter.grman@gmail.com]=20
Sent: 28 November 2014 02:12
To: user@couchdb.apache.org
Subject: Re: Allow user-defined views

No, I don't. The program should be for analysing logs (collected by
fluentd) - should be open source and on github, however there isn't much =
done yet: https://github.com/logTank/

The index rebuilding shouldn't be a problem as CouchDB will be only used =
for general stats and the user actually won't see the up to date data, =
but always with a delay - another advantage of CouchDB, I can read the =
queries without bothering the system, and once the data is outdated, I =
can update the index. At least so far the theory, I'll need to run some =
performance tests if that actually works, once I'll have a MVP. The =
other option is to use MongoDB for ad-hoc queries, but I was thinking =
that CouchDB will be more efficient as storage is so cheap.

As I'm learning every time I look up info about CouchDB something new, =
and something becomes more clear, I'm also glad about feedback on the =
idea in general, how I want to use CouchDB.

However I'd be also very happy if I could somehow solve the problem with =
the possible DoS attacks :). Maybe there is something in CouchDB or =
evalcx which I can configure - maximal runtime of a map/reduce function?
(shouldn't be more than 1ms). Or there are some logged data by CouchDB =
about the resources required by views (CPU Time + HDD Space)?

Cheers
Peter

On Fri Nov 28 2014 at 2:54:51 AM Alexander Gabriel <alex@barbalex.ch> =
wrote:

> sorry for being off-topic
> Alex
>
>
> 2014-11-28 2:52 GMT+01:00 Alexander Gabriel <alex@barbalex.ch>:
>
> > sounds like a very interesting application
> >
> > seems like you dont care if the user has to wait for an index to be=20
> > built when the user creates a query
> >
> > Alex
> >
> >
> > 2014-11-28 2:23 GMT+01:00 Peter Grman <peter.grman@gmail.com>:
> >
> >> Hi Alex,
> >>
> >> Yes, the users would be able to import different sets of data,=20
> >> which
> isn't
> >> relational, and use the platform to analyse it. The analysed data=20
> >> would
> be
> >> in 99% of the cases append only (+ removing old data) and the data=20
> >> can
> be
> >> defined by the user, as well as be hierarchical.
> >>
> >> When I thought about the system in the beginning, CouchDB seemed=20
> >> like an awesome choice as there would be only a couple of well=20
> >> defined queries
> and
> >> storage is generally cheap, I thought that CouchDB views and their
> caching
> >> are what I'm looking for.
> >>
> >> The problem is again only with people who want to trick the system. =

> >> I would be also happy with a solution which would detect bad views=20
> >> ones they
> have
> >> been deployed (uses too much space, takes too long to compute) and=20
> >> deactivates and marks them for me to check. This way I could check=20
> >> those few people who try a DoS attack and ban them from the =
service.
> >>
> >> The additional main problem was, if it is really impossible to get=20
> >> data from a different database inside the view and if the user=20
> >> won't be able
> to
> >> access the underlying system, ..., or if it is just very difficult=20
> >> =3D> possible, if someone wants to do it they'll find a way. But=20
> >> after
> reading
> >> more and understanding more, how the views are executed using =20
> >> evalcx I think the other problems aren't a big concern for me=20
> >> anymore, is that correct?.
> >>
> >> Although I've found in the code "if possible, use evalcx (not=20
> >> always available)" - how can I check that evalcx is available on my =

> >> system? Or
> is
> >> it just a note for older distributions, nothing to be concerned=20
> >> about anymore?
> >>
> >> Thank you
> >>
> >> Cheers
> >> Peter
> >>
> >> On Fri Nov 28 2014 at 1:37:57 AM Alexander Gabriel=20
> >> <alex@barbalex.ch>
> >> wrote:
> >>
> >> > Hi Peter
> >> >
> >> > Will the users create their own datastructures too?
> >> > If not this sounds like sql on relational tables might be a=20
> >> > better
> tool
> >> for
> >> > the problem.
> >> > It seems to me you're hitting exactly the weak point of most=20
> >> > nosql solutions.
> >> >
> >> > Alex
> >> >
> >> >
> >> > 2014-11-28 0:49 GMT+01:00 Peter Grman <peter.grman@gmail.com>:
> >> >
> >> > > Hi,
> >> > >
> >> > > this might sound like a terrible idea to someone who knows=20
> >> > > CouchDB,
> >> and
> >> > if
> >> > > that's the case, please just take a minute or two, to explain=20
> >> > > why, otherwise, if the idea isn't so crazy after all, I hope=20
> >> > > I'll get
> some
> >> > > solutions to my problem:
> >> > >
> >> > > I'm thinking of creating a platform based on CouchDB, where=20
> >> > > each set
> >> of
> >> > > users (group, customer, ...) would get their own CouchDB=20
> >> > > Database,
> to
> >> > store
> >> > > and query data. I've heard in a podcast, roughly a year ago,=20
> >> > > that
> >> this is
> >> > > how CouchDB was meant to be - many smaller databases.
> >> > >
> >> > > To query the data, I want to allow them, to define their own=20
> >> > > custom queries. Now I could (and want to) create a form which=20
> >> > > allows to
> >> build a
> >> > > query and translates it to a JS view, but I was thinking about=20
> >> > > additionally, on top of that, allowing them to define their=20
> >> > > custom
> >> views
> >> > > directly in JS. They would basically be allowed to define their
> custom
> >> > > Map/Reduce functions.
> >> > >
> >> > > There is a lot which can go wrong with this the worst ones I=20
> >> > > came up
> >> > with:
> >> > > - DoS attack with endless loops inside the function
> >> > > - DoS attack by emitting too much data (potentially in a loop=20
> >> > > again)
> >> > >
> >> > > As far as I've understood, it's not possible to access other
> Databases
> >> > from
> >> > > within the view, is this understanding of mine correct?
> >> > >
> >> > > Is it possible to access the filesystem or network services in=20
> >> > > any
> way
> >> > from
> >> > > the CouchDB view or is the JavaScript engine, which is running=20
> >> > > the
> >> code,
> >> > > limiting enough?
> >> > >
> >> > > Are there any other things which could go wrong? - or did=20
> >> > > actually
> >> > somebody
> >> > > already use CouchDB like this, and it's perfectly normal?
> >> > >
> >> > > Is there any way I could prevent the problem with endless loops =

> >> > > and
> >> data
> >> > > emitting from happening? - I can run JSLint, which maybe will=20
> >> > > detect
> >> an
> >> > > endless loop, but that won't help against a loop with a million
> >> > iterations,
> >> > > which will be called for every item inside CouchDB - still=20
> >> > > quite
> >> endless.
> >> > >
> >> > > Thank you for your help!
> >> > >
> >> > > Cheers,
> >> > > Peter
> >> > >
> >> >
> >>
> >
> >
>