couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McDaniel <couc...@autosys.us>
Subject Re: Erlang API
Date Mon, 16 Feb 2009 04:30:40 GMT

 thanks, re: github


   erlview, an 'Erlang View Server for CouchDB' is now located at


      http://github.com/mmcdanie/erlview/tree/master



 Not for the particularly faint-hearted, though I did return
 152,445 documents (state == "MS") out of approx. 10 million
 documents in about 82 minutes (approx. 2,000 search/second).

 Futon said the db was 9.4GB and 9947941 documents.

 Made a 226,070,103 byte erl.view file which grew at approximately
 3 MegaBytes/minute.


 Here's the map fun:

    fun(Doc) ->
     erlview:find_all_content( 
                Doc,
                { [{<<"state">>, <<"MS">>}],
                  [ <<"category">>,
                    <<"city">>,
                    <<"country">>,
                    <<"createTime">>,
                    <<"creatorsName">>,
                    <<"name">>,
                    <<"postalCode">>,
                    <<"state">>,
                    <<"street1">>,
                    <<"telephoneNumber1">> ]
                 } )


 Here's machine info:

$ dmesg | egrep "Pentium|Memory"
[   30.747511] Memory: 1538056k/1563840k available (2257k kernel code, 24564k reserved, 1034k
data, 384k init, 646336k highmem)
[   31.219589] CPU0: Intel(R) Pentium(R) 4 CPU 3.00GHz stepping 01
[   31.377687] CPU1: Intel(R) Pentium(R) 4 CPU 3.00GHz stepping 01



 Single spindle machine else I think view would have been faster.  
 Average disk i/o according to atop was around 3ms.

 erlview doesn't seem to do so well with multiple concurrent views.
 I have some comments in the source about that, and possible fix.
 Still need to test with the new code (re multiple concurrent views).


 Nothing special about the documents at all, should work on any valid
 CDB document db.


~Michael 



On Thu, Feb 12, 2009 at 04:56:42AM -0500, Robert Dionne wrote:
>
>
>
> On Feb 12, 2009, at 2:59 AM, Michael McDaniel wrote:
>
>>
>> On Wed, Feb 11, 2009 at 02:57:19PM -0800, Chris Anderson wrote:
>>> On Wed, Feb 11, 2009 at 2:36 PM, Michael McDaniel  
>>> <couchdb@autosys.us> wrote:
>>>>
>>>>  I'll post some code when it does a bit more than gurgle bubbles.
>>>
>>> It sounds like you are on the right track, and you pointed out the
>>> bits 'erlview' etc that I saw reading it. The only other thing I see
>>> (which is probably me missing stuff), is how does this work if we  
>>> want
>>> to run multiple distinct views concurrently?
>> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>>
>>  -module(erlview).
>>  -behaviour(gen_server)
>>
>>
>>  Although the map fun helpers are not written as handle_call funs, so
>>  it is not entirely clear to me about the multiple concurrent views.
>>  That is something I started thinking about two or three days ago
>>  myself so some testing will be called for.  Though I do think it
>>  will work ok if I am understanding correctly that
>>  couch_query_servers:map_docs/2 always initiates the views (well,
>>  except after an add_fun when map_docs runs through some as-of-yet
>>  unknown-to-me method).  Because couch_query_servers:map_docs/2
>>  calls couch_os_process:prompt/2 which uses gen_server:call/2.
>>
>>  I debated about allowing any and all arbitrary Erlang code in the
>>  map funs versus having helper funs which the map fun calls
>>  (e.g.  all_content/2 returns all docs whose content matches a list
>>   of field/content tuples;
>>   all_fields/2 returns all docs which contain all fields in a list;
>>   etc. And a list output fields can be specified.).
>>
>>  I wound up allowing any and all arbitrary Erlang code in the map
>>  funs, and also wrote some helper funs to hide some of the underlying
>>  data.
>>
>>  As you can surmise, it is not using plain emit(doc.name, [doc]) in
>>  the map funs.
>>
>>  Kind of a mess but, as you alluded to, a start.
>>
>> ~M
>>
>>
>>>
>>> I haven't seen your server code yet, but I for one would be excited 
>>> to
>>> help port some parts of main.js over to it.
>> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>>
>>  I look forward to the help.
>>
>> ~M
>>
>>>
>>>>
>>>>
>>>>  Please, if there is some activity to change CDB internals to
>>>>  simplify a native Erlang view server or create a more direct
>>>>  interface, let me know so I don't go too far down this road.
>>>
>>> We'll eventually want to sandbox the Erlang (not sure if there are
>>> libraries for that yet). If you keep up this work, likely you'll be
>>> leading the charge for Erlang view servers.
>>>
>>>>
>>>>  No sense replicating effort (only databases!).
>>>>
>>>
>>> There's an active community of people forking this CouchDB git repo.
>>> If you worked on your code in public there you might get some helpers
>>> showing up.
>>>
>>> http://github.com/halorgium/couchdb/
>> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 
>> ^
>>
>>  Not exactly sure what "forking this CouchDB git repo" means.
>>
>>  My goal is to clean up existing code and get it posted by early
>>  next week.  I really need to write up a little bit of "howto"
>>  for local.ini additions, compile and install instructions, and
>>  couch_query_servers.erl changes/compile help.
>
> Git is definitely worth the investment in learning, the guides at GitHub 
> are very useful. A key thing is that branches are dirt cheap.  This page 
> in particular describes a common workflow I've found very good:
>
> http://github.com/guides/fork-a-project-and-submit-your-modifications
>
>
>
>
>>
>>
>>  Do you have some simple instructions on how to get the code
>>  posted, or should I go read the http://github.com/guides/home
>>  links?  I have only recently started reading about git.
>>
>> ~Michael
>>
>>>
>>> Chris
>>>
>>> -- 
>>> Chris Anderson
>>> http://jchris.mfdz.com
>>
>> -- 
>> Michael McDaniel
>> Portland, Oregon, USA
>> http://trip.autosys.us
>> http://autosys.us
>>


Mime
View raw message