Return-Path: X-Original-To: apmail-couchdb-user-archive@www.apache.org Delivered-To: apmail-couchdb-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 9766E1070D for ; Fri, 28 Nov 2014 06:50:37 +0000 (UTC) Received: (qmail 23410 invoked by uid 500); 28 Nov 2014 06:50:36 -0000 Delivered-To: apmail-couchdb-user-archive@couchdb.apache.org Received: (qmail 23343 invoked by uid 500); 28 Nov 2014 06:50:36 -0000 Mailing-List: contact user-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@couchdb.apache.org Delivered-To: mailing list user@couchdb.apache.org Received: (qmail 23331 invoked by uid 99); 28 Nov 2014 06:50:36 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 28 Nov 2014 06:50:36 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=5.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [217.147.82.189] (HELO mail.villageonline.info) (217.147.82.189) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 28 Nov 2014 06:50:31 +0000 Received: (qmail 9774 invoked by uid 108); 28 Nov 2014 06:50:09 +0000 Received: from unknown (HELO listar) (86.146.54.254) by my.villageonline.info with SMTP; 28 Nov 2014 06:50:09 +0000 From: To: References: In-Reply-To: Subject: RE: Allow user-defined views Date: Fri, 28 Nov 2014 06:50:10 -0000 Message-ID: <004501d00ad7$8db85a60$a9290f20$@co.uk> MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Mailer: Microsoft Office Outlook 12.0 Thread-Index: AdAKsPoVRchCKndRSaGzl5Tx/Odj0QAJcUGA Content-Language: en-gb X-Virus-Checked: Checked by ClamAV on apache.org Hi Peter, An interesting concept ... This may sound simplistic, but is it viable for your application to = initially have a process where new queries written are vetted by a human = before they are run ? The advantage of this is two-fold, namely: i) you'll be able to move on a prove your concept quickly ii) while doing this, you may learn enough (and things may change = enough) for you to automate the vetting process Thanks, Justin -----Original Message----- From: Peter Grman [mailto:peter.grman@gmail.com]=20 Sent: 28 November 2014 02:12 To: user@couchdb.apache.org Subject: Re: Allow user-defined views No, I don't. The program should be for analysing logs (collected by fluentd) - should be open source and on github, however there isn't much = done yet: https://github.com/logTank/ The index rebuilding shouldn't be a problem as CouchDB will be only used = for general stats and the user actually won't see the up to date data, = but always with a delay - another advantage of CouchDB, I can read the = queries without bothering the system, and once the data is outdated, I = can update the index. At least so far the theory, I'll need to run some = performance tests if that actually works, once I'll have a MVP. The = other option is to use MongoDB for ad-hoc queries, but I was thinking = that CouchDB will be more efficient as storage is so cheap. As I'm learning every time I look up info about CouchDB something new, = and something becomes more clear, I'm also glad about feedback on the = idea in general, how I want to use CouchDB. However I'd be also very happy if I could somehow solve the problem with = the possible DoS attacks :). Maybe there is something in CouchDB or = evalcx which I can configure - maximal runtime of a map/reduce function? (shouldn't be more than 1ms). Or there are some logged data by CouchDB = about the resources required by views (CPU Time + HDD Space)? Cheers Peter On Fri Nov 28 2014 at 2:54:51 AM Alexander Gabriel = wrote: > sorry for being off-topic > Alex > > > 2014-11-28 2:52 GMT+01:00 Alexander Gabriel : > > > sounds like a very interesting application > > > > seems like you dont care if the user has to wait for an index to be=20 > > built when the user creates a query > > > > Alex > > > > > > 2014-11-28 2:23 GMT+01:00 Peter Grman : > > > >> Hi Alex, > >> > >> Yes, the users would be able to import different sets of data,=20 > >> which > isn't > >> relational, and use the platform to analyse it. The analysed data=20 > >> would > be > >> in 99% of the cases append only (+ removing old data) and the data=20 > >> can > be > >> defined by the user, as well as be hierarchical. > >> > >> When I thought about the system in the beginning, CouchDB seemed=20 > >> like an awesome choice as there would be only a couple of well=20 > >> defined queries > and > >> storage is generally cheap, I thought that CouchDB views and their > caching > >> are what I'm looking for. > >> > >> The problem is again only with people who want to trick the system. = > >> I would be also happy with a solution which would detect bad views=20 > >> ones they > have > >> been deployed (uses too much space, takes too long to compute) and=20 > >> deactivates and marks them for me to check. This way I could check=20 > >> those few people who try a DoS attack and ban them from the = service. > >> > >> The additional main problem was, if it is really impossible to get=20 > >> data from a different database inside the view and if the user=20 > >> won't be able > to > >> access the underlying system, ..., or if it is just very difficult=20 > >> =3D> possible, if someone wants to do it they'll find a way. But=20 > >> after > reading > >> more and understanding more, how the views are executed using =20 > >> evalcx I think the other problems aren't a big concern for me=20 > >> anymore, is that correct?. > >> > >> Although I've found in the code "if possible, use evalcx (not=20 > >> always available)" - how can I check that evalcx is available on my = > >> system? Or > is > >> it just a note for older distributions, nothing to be concerned=20 > >> about anymore? > >> > >> Thank you > >> > >> Cheers > >> Peter > >> > >> On Fri Nov 28 2014 at 1:37:57 AM Alexander Gabriel=20 > >> > >> wrote: > >> > >> > Hi Peter > >> > > >> > Will the users create their own datastructures too? > >> > If not this sounds like sql on relational tables might be a=20 > >> > better > tool > >> for > >> > the problem. > >> > It seems to me you're hitting exactly the weak point of most=20 > >> > nosql solutions. > >> > > >> > Alex > >> > > >> > > >> > 2014-11-28 0:49 GMT+01:00 Peter Grman : > >> > > >> > > Hi, > >> > > > >> > > this might sound like a terrible idea to someone who knows=20 > >> > > CouchDB, > >> and > >> > if > >> > > that's the case, please just take a minute or two, to explain=20 > >> > > why, otherwise, if the idea isn't so crazy after all, I hope=20 > >> > > I'll get > some > >> > > solutions to my problem: > >> > > > >> > > I'm thinking of creating a platform based on CouchDB, where=20 > >> > > each set > >> of > >> > > users (group, customer, ...) would get their own CouchDB=20 > >> > > Database, > to > >> > store > >> > > and query data. I've heard in a podcast, roughly a year ago,=20 > >> > > that > >> this is > >> > > how CouchDB was meant to be - many smaller databases. > >> > > > >> > > To query the data, I want to allow them, to define their own=20 > >> > > custom queries. Now I could (and want to) create a form which=20 > >> > > allows to > >> build a > >> > > query and translates it to a JS view, but I was thinking about=20 > >> > > additionally, on top of that, allowing them to define their=20 > >> > > custom > >> views > >> > > directly in JS. They would basically be allowed to define their > custom > >> > > Map/Reduce functions. > >> > > > >> > > There is a lot which can go wrong with this the worst ones I=20 > >> > > came up > >> > with: > >> > > - DoS attack with endless loops inside the function > >> > > - DoS attack by emitting too much data (potentially in a loop=20 > >> > > again) > >> > > > >> > > As far as I've understood, it's not possible to access other > Databases > >> > from > >> > > within the view, is this understanding of mine correct? > >> > > > >> > > Is it possible to access the filesystem or network services in=20 > >> > > any > way > >> > from > >> > > the CouchDB view or is the JavaScript engine, which is running=20 > >> > > the > >> code, > >> > > limiting enough? > >> > > > >> > > Are there any other things which could go wrong? - or did=20 > >> > > actually > >> > somebody > >> > > already use CouchDB like this, and it's perfectly normal? > >> > > > >> > > Is there any way I could prevent the problem with endless loops = > >> > > and > >> data > >> > > emitting from happening? - I can run JSLint, which maybe will=20 > >> > > detect > >> an > >> > > endless loop, but that won't help against a loop with a million > >> > iterations, > >> > > which will be called for every item inside CouchDB - still=20 > >> > > quite > >> endless. > >> > > > >> > > Thank you for your help! > >> > > > >> > > Cheers, > >> > > Peter > >> > > > >> > > >> > > > > >