couchdb-erlang mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kai Griffin <k...@resourceandrevenue.com>
Subject Re: Using the Erlang view server to Educate in CouchDB
Date Mon, 05 Nov 2012 01:43:41 GMT
Thanks for this entertaining & informative post, David.  Erlang is a 
totally foreign language to me, but I joined this new list hoping to 
rectify that situation over time, and this is exactly the kind of stuff 
I'd like to archive for future reference when I get some time to wade 
into it properly.  I think experimenting with the native query server is 
potentially a great way to learn (and perhaps destroy a few databases 
along the way...).
Cheers,
Kai

On 04/11/2012 16:44, David Martin wrote:
> Learning about the Erlang internals of CouchDB is daunting for many.
>
> One of the reasons for this is the difficulty of learning the 
> intricacies of
> the build system and finding your way around the code.
>
> Jan Lehnart has made a very good start with his recent well written 
> post detailing a
> walk through the code that is used during a POST/PUT operation.
>
> Thank you Jan!
>
> I offer another insight into the Erlang world of CouchDB that is very 
> easy
> to implement, needs no delving into the codebase or build systems,
> yet gives an insight to many of the important concepts used in CouchDB
> and Erlang.
>
> This example posted below uses the Erlang (a.k.a native) query server
> to illustrate an Erlang solution to the problem of knowing what data 
> is in
> a free-form database. (determining "Schemas" in freeform data).
>
> This can also be done in Javascript, but hey! this is about Erlang.
>
> As all clients of a CouchDB database are (in general terms) free to 
> create
> JSON structures of an arbitrary form the problem becomes one of finding
> Views that are of interest to potential users of the data.
>
> As an illustration, take a police criminal investigation.
> All the thousands of contributors to the accumulation of data on the case
> use different nomenclature and forms and languages and computer 
> systems to
> urgently collate the data in office of Inspector Jacques Clouseau.
>
> Jacques has a schema in mind for his huge police SQL database, but the 
> urgency
> of the case, the Press pressure, the masses of international data pouring
> in forces his hand. He says,
> "As long as you send in the data in JSON format, I will do the rest! just
> send it in to this new CouchDB using Futon in any form you wish."
>
> The database grows at an alarming rate.
> Jacques goes to his local.ini file and types the magic words,
>
> [native_query_servers]
> erlang = {couch_native_process, start_link, []}
>
> and restarts his CouchDB server.
>
> Then he opens the rapidly growing database and in Futon types the Map 
> Function
> below within the #### lines
>
> ############################################################################### 
>
> fun({Doc})-> % The view function using The Y-Combinator
>
> %% david m. w. martin 2012 U.K.. david.martin@lymegreen.co.uk
> %% use the Y-combinator function for anonymous recursion, thanks 
> Haskell Curry!
>
> Y = fun(M)->(fun(X) -> X(X) end)(fun (F) -> M(fun(A) -> (F(F))(A) end) 
> end) end,
> %-------------------------------------------
> Logging_on = false, %% logging is to couch.log
>
> Recurser = fun (F) ->
>
> %start of inner functions,
> %they can only pass 1 argument so we use a tuple
>
> fun
> %-------------------------------------------------------------------
>
> ({logger,Legend_string,Parms}) ->
> case Logging_on of
> true-> Log(io_lib:format(Legend_string,Parms));
> false->ok
> end;
> %------------------------------------------------------------------------------- 
>
> ({Term,Path,Q0}) when is_list(Term)->
> %F({logger,"state of Q = ~p",[Q0]}),
> lists:foreach(fun (Key) ->
> case proplists:get_value(Key,Term,null) of
>
> Value when is_tuple(Value)->
> Path1 =[Key|Path],
> {Stripped_Value}=Value,
> F({Stripped_Value,[Key|Path],Q0}); % recursive call to F
> Value when is_list(Value)->
> lists:foreach(fun ({Element,Index}) ->
> Path1=[ [Index] |[ Key | Path]],
> F({Element,Path1,Index}) % recursive call to F
> end,lists:zip(Value,lists:seq(0,length(Value)-1)));
>
> Value->
> Path1=[Key|Path],
> Emit(lists:reverse(Path1),1)
>
> end
> end, proplists:get_keys(Term));
> %-------------------------------------------------------------------
> ({Element,Path,Queue}) when is_tuple(Element)->
> {Stripped_Value}=Element,
> F({Stripped_Value,Path,Queue}); % recursive call to F
> %-------------------------------------------------------------------
> ({Element,Path,Queue})->
> Emit(lists:reverse(Path),1)
>
> end %end of inner Functions
> %------------------------------------------------------------------------------- 
>
> end, % end of Recurser
> %% this line calls the Y(Recursor) with in intial parameter 
> {Doc,[<<"Doc">>],0}
> Log(io_lib:format("
> ## End of Document ####~n~p~n", [ (Y(Recurser))( {Doc,[<<"Doc">>],0} 
> )] ))
>
> end. % end of View fun
>
> ################################################################################ 
>
>
> and the reduce function
>
> ################################################################################ 
>
> fun(Keys,Values,ReReduce)->
> case ReReduce of
> true->lists:sum(Values); % it is a ReReduce of Values
> false->length(Values) % it is just a reduce of Values
> end
> end.
> ################################################################################ 
>
>
> The Inspector thanks ben hollis at http://benhollis.net/
> for the inspiration to crack JSON using some of his test data below, 
> modified
> to show objects within arrays and arrays within objects.
>
> Test data in JSON form, your _id and _rev will differ.
>
> {
> "_id": "61c3f496b9e4c8dc29b95270d9000370",
> "_rev": "5-672ef483f9f4fb1386e38bb691442183",
> "test": {
> "hey": "guy",
> "a_number": 243,
> "an_object": {
> "whoa": "nuts",
> "an_array": [
> 1.0000000000001,
> 2,
> "thr<h1>ee",
> {
> "whoa": "nuts",
> "an_array": [
> 1,
> 3.9999999999,
> "thr<h1>ee"
> ]
> }
> ]
> },
> "awesome": true,
> "bogus": false,
> "meaning": null,
> "japanese": "明日がある。",
> "link": "http://jsonview.com",
> "notLink": "http://jsonview.com is great"
> }
> }
>
> This is the form in which CouchDB stores the Documents on disk
> and this is the form on which Erlang works internally on JSON documents
>
> {[{<<"_id">>,<<"61c3f496b9e4c8dc29b95270d9000370">>},
> {<<"_rev">>,<<"5-672ef483f9f4fb1386e38bb691442183">>},
> {<<"test">>,
> {[{<<"hey">>,<<"guy">>},
> {<<"a_number">>,243},
> {<<"an_object">>,
> {[{<<"whoa">>,<<"nuts">>},
> {<<"an_array">>,
> [1.0000000000001,2,<<"thr<h1>ee">>,
> {[{<<"whoa">>,<<"nuts">>},
> {<<"an_array">>,[1,3.9999999999,<<"thr<h1>ee">>]}]}]}]}},
> {<<"awesome">>,true},
> {<<"bogus">>,false},
> {<<"meaning">>,null},
> {<<"japanese">>,
> <<230,152,142,230,151,165,227,129,140,227,129,130,227,130,139,227,128,130>>},

>
> {<<"link">>,<<"http://jsonview.com">>},
> {<<"notLink">>,<<"http://jsonview.com is great">>}]}}]}
>
> The Inspector tested the map function and reduce function with 
> differing levels
> of grouping on the test data.Then the Inspector found all the relevant 
> data
> in the now huge database of clues and was able, using the output, to 
> construct
> Javascript Views that could seek out and collate every last relevant 
> piece of
> data in the database. The case was solved in record time!
>
> The map function allows deep recursion to be performed on each 
> document in a
> database and in order to protect the concurrent running and updating 
> of the
> database, (clues are still coming in), the gen_server handling the 
> mapping
> may take too long resulting in a message
>
> "Error: timeout
> {gen_server,call,
> [<0.4612.32>,
> {prompt,[<<"map_doc">>,......
>
> This happens on slow machines with very complex document structures.
>
> Increase "os_process_timeout 5000 in default.ini to larger values.
>
> The Erlang view server is very powerful, In fact it can access all the
> functionality of CouchDB and all the functions of Erlang.
> This is with the proviso that it is called on every document in a 
> database,
> and can only recurse using the Y-Combinator.
>
> If you only have one dummy document,
> you only call the function once and can thus code like
>
> lists:foreach(fun(X)->
> Log(io_lib:format("~p: ~p",[X,ets:info(X)]))
> end,ets:all()),
>
> inserted after "end, % end of Recurser"
>
> will give a list of elements like
>
> inet_hosts_byaddr: [{compressed,false},
> {memory,286},
> {owner,<0.16.0>},
> {heir,none},
> {name,inet_hosts_byaddr},
> {size,0},
> {node,'rcouch@127.0.0.1'},
> {named_table,true},
> {type,bag},
> {keypos,3},
> {protection,protected}]
>
> ac_tab: [{compressed,false},
> {memory,8265},
> {owner,<0.7.0>},
> {heir,none},
> {name,ac_tab},
> {size,127},
> {node,'rcouch@127.0.0.1'},
> {named_table,true},
> {type,set},
> {keypos,1},
> {protection,public}] ....
>
> of (ets:info) on (ets:all) i.e. all ets tables in CouchDB
>
> The Inspector says, "It would be nice for Erlang Views to have their
> own ETS table in this list, instead of grabbing {protection,public}
> ones for transient use!"
>
> I conclude by saying what Robert Newson would wish me to say,
> "This is for your education in Erlang and CouchDB and should not be 
> used in any
> mission critical applications. The Erlang View Server is not sandboxed,
> and carefully crafted functions can damage your whole life as you know 
> it".
> "As long as you trust everyone that can update design documents, that 
> is fine"
>
> I hope this is of some use to ESAK's (Earnest Seekers After Knowledge)
> into the mysteries of Erlang and its use in CouchDB wherever they may be.
>
> Any improvements, questions or comments will be gratefully received,
>
> David M. W. Martin (davidoccam)
>
>
>
>
>


Mime
View raw message