incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jan Lehnardt <...@apache.org>
Subject Re: Erlang vs JavaScript
Date Thu, 15 Aug 2013 09:38:19 GMT

On Aug 15, 2013, at 10:09 , Robert Newson <rnewson@apache.org> wrote:

> A big +1 to Jason's clarification of "erlang" vs "native". CouchDB
> could have shipped an erlang view server that worked in a separate
> process and had the stdio overhead, to combine the slowness of the
> protocol with the obtuseness of erlang. ;)
> 
> Evaluating Javascript within the erlang VM process intrigues me, Jens,
> how is that done in your case? I've not previously found the assertion
> that V8 would be faster than SpiderMonkey for a view server compelling
> since the bottleneck is almost never in the code evaluation, but I do
> support CouchDB switching to it for the synergy effects of a closer
> binding with node.js, but if it's running in the same process, that
> would change (though I don't immediately see why the same couldn't be
> done for SpiderMonkey). Off the top of my head, I don't know a safe
> way to evaluate JS in the VM. A NIF-based approach would either be
> quite elaborate or would trip all the scheduling problems that
> long-running NIF's are now notorious for.
> 
> At a step removed, the view server protocol itself seems like the
> thing to improve on, it feels like that's the principal bottleneck.

The code is here: https://github.com/couchbase/couchdb/tree/master/src/mapreduce

I’d love for someone to pick this up and give CouchDB, say, a ./configure --enable-native-v8
option or a plugin that allows people to opt into the speed improvements made there. :)

The choice for V8 was made because of easier integration API and more reliable releases as
a standalone project, which I think was a smart move.

IIRC it relies on a change to CouchDB-y internals that has not made it back from Couchbase
to CouchDB (Filipe will know, but I doubt he’s reading this thread), but we should look
into that and get us “native JS views”, at least as an option or plugin.

CCing dev@.

Jan
--





> 
> B.
> 
> 
> On 15 August 2013 08:22, Jason Smith <jhs@apache.org> wrote:
>> Yes, to a first approximation, with a native view, CouchDB is basically
>> running eval() on your code. In my example, I took advantage of this to
>> build a nonstandard response to satisfy an application. (Instead of a 404,
>> we sent a designated fallback document body.)
>> 
>> But, if you accumulate the list in a native view, a JavaScript view, or a
>> hypothetical Erlang view (i.e. a subprocess), from the operating system's
>> perspective, the memory for that list will be allocated somewhere. Either
>> the CouchDB process asks for X KB more memory, or its subprocess will ask
>> for it. So I think the total system impact is probably low in practice.
>> 
>> So I guess my point is not that native views are wrong, just they have a
>> cost so you should weigh the cost/benefit for your own project. In the case
>> of manage_couchdb, I wrote a JavaScript implementation; but since sometimes
>> I have an emergency and I must find conflicts ASAP, I made an Erlang
>> version because it is worth it.
>> 
>> 
>> On Thu, Aug 15, 2013 at 2:05 PM, Stanley Iriele <siriele2x3@gmail.com>wrote:
>> 
>>> Whoa...OK...that I had no idea about...thanks for taking the time to go to
>>> that granularity, by the way.
>>> 
>>> So does this mean that the process memory is shared? As apposed to living
>>> in its own space?.so if someone accumulates a large json object in a list
>>> function its chewing up couchdb's memory?... I guess I'm a little confused
>>> about what's in the same process and what isn't now
>>> On Aug 14, 2013 11:57 PM, "Jason Smith" <jhs@apache.org> wrote:
>>> 
>>>> To me, an Erlang view is a view server which supports map, reduce, show,
>>>> update, list, etc. functions in the Erlang language. (Basically it is
>>>> implemented in Erlang.)
>>>> 
>>>> A view server is a subprocess that runs beneath CouchDB which
>>> communicates
>>>> with it over standard i/o. It is a different process in the operating
>>>> system and only interfaces with the main server using the view server
>>>> protocol (basically a bunch of JSON messages going back and forth).
>>>> 
>>>> I do not know of an Erlang view server which works well and is currently
>>>> maintained.
>>>> 
>>>> A native view (shipped by CouchDB but disabled by default) is some
>>>> corner-cutting. Code is evaluated directly by the primary CouchDB server.
>>>> Since CouchDB is Erlang, the native query server is necessarily Erlang.
>>> The
>>>> key difference is, your code is right there in the eye of the storm. You
>>>> can call couch_server:open("some_db") and completely circumvent security
>>>> and other invariants which CouchDB enforces. You can leak memory until
>>> the
>>>> kernel OOM killer terminates CouchDB. It's not about the language, it's
>>>> that is is running inside the CouchDB process.
>>>> 
>>>> 
>>>> 
>>>> On Thu, Aug 15, 2013 at 1:36 PM, Stanley Iriele <siriele2x3@gmail.com
>>>>> wrote:
>>>> 
>>>>> Wait....I'm a tad confused here..Jason what is the difference between
>>>>> native views and Erlang views?...
>>>>> On Aug 14, 2013 11:16 PM, "Jason Smith" <jhs@apache.org> wrote:
>>>>> 
>>>>>> Oh, also:
>>>>>> 
>>>>>> They are **not** Erlang views. They are **native** views. We should
>>>>>> emphasize the latter to remind ourselves about the security and
>>>>> reliability
>>>>>> risks which Bob identifies.
>>>>>> 
>>>>>> They are very powerful, but it is a trade-off. Once I had a customer
>>>> who
>>>>>> had a basic "class" document describing common values. All other
>>>>> documents
>>>>>> were for modifications to the "base class" so to speak. He needed
to
>>>>> query
>>>>>> by document ID, but if no such document existed, return the "base
>>>> class"
>>>>>> document instead. The product was already in the field and so the
>>> code
>>>>>> could not change. We had to change it in CouchDB.
>>>>>> 
>>>>>> The fix was very simple: a _rewrite rule to a native _show function.
>>> In
>>>>> the
>>>>>> show function, if the Doc was null, then we used the internal CouchDB
>>>> API
>>>>>> to fetch the default document. Voila.
>>>>>> 
>>>>>> 
>>>>>> On Thu, Aug 15, 2013 at 1:08 PM, Jason Smith <jhs@apache.org>
wrote:
>>>>>> 
>>>>>>> On Thursday, August 15, 2013, Andrey Kuprianov wrote:
>>>>>>> 
>>>>>>>> Doesnt server performance downgrade, while views are being
>>> rebuilt?
>>>> So
>>>>>> the
>>>>>>>> faster they are rebuilt, the better for you.
>>>>>>> 
>>>>>>> 
>>>>>>> If my view build would degrade total performance to cross an
>>>>> unacceptable
>>>>>>> threshold, then I am really riding the line! What about an
>>> unplanned
>>>>>>> compaction? What if one day the clients have a bug and load
>>>> increases?
>>>>>> What
>>>>>>> if an unplanned disaster happens and a backup must be performed
>>>>> urgently?
>>>>>>> 
>>>>>>> I would evaluate view performance in the larger context of the
>>> entire
>>>>>>> application life cycle.
>>>>>>> 
>>>>>>> Men seem to want to date beautiful women. It is a very high
>>> priority
>>>> at
>>>>>>> the pub or whatever. But long-married men do not even think about
>>>> their
>>>>>>> wife's attractiveness because that is a small, superficial part
of
>>> a
>>>>> much
>>>>>>> larger story.
>>>>>>> 
>>>>>>> 
>>>>>>>> 
>>>>>>>> Besides, looks like it's possible to do the same 3 steps
with
>>> design
>>>>> doc
>>>>>>>> views created in Erlang? Or is it just about using require()
in
>>>>> Node.js?
>>>>>>>> 
>>>>>>> 
>>>>>>> Actually, yes that is a fine point. I myself prefer Node.js but
>>>> anyone
>>>>>> can
>>>>>>> choose the best fit for them.
>>>>>>> 
>>>>>>> And speaking more broadly, CouchDB is a very flexible platform
so
>>> it
>>>> is
>>>>>>> quite likely that my own policies do not apply to every use case.
>>> In
>>>>> fact
>>>>>>> if I'm honest I use native views myself, usually for unplanned
>>>>>>> troubleshooting, I want to find conflicts so I use manage_couchdb:
>>>>>>> http://github.com/iriscouch/manage_couchdb
>>>>>>> 
>>>>>>> My main point is, anybody time somebody says "performance" ask
>>>> yourself
>>>>>> if
>>>>>>> it is really a "performance siren." Earlier in this thread, Jens
>>>> raises
>>>>>>> some examples of plausible true performance requirements, not
just
>>>>> siren
>>>>>>> songs.
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 


Mime
View raw message