couchdb-erlang mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Martin <david.mar...@lymegreen.co.uk>
Subject Using the Erlang view server to Educate in CouchDB
Date Sun, 04 Nov 2012 15:44:04 GMT
Learning about the Erlang internals of CouchDB is daunting for many.

One of the reasons for this is the difficulty of learning the 
intricacies of
the build system and finding your way around the code.

Jan Lehnart has made a very good start with his recent well written post 
detailing a
walk through the code that is used during a POST/PUT operation.

Thank you Jan!

I offer another insight into the Erlang world of CouchDB that is very easy
to implement, needs no delving into the codebase or build systems,
yet gives an insight to many of the important concepts used in CouchDB
and Erlang.

This example posted below uses the Erlang (a.k.a native) query server
to illustrate an Erlang solution to the problem of knowing what data is in
a free-form database. (determining "Schemas" in freeform data).

This can also be done in Javascript, but hey! this is about Erlang.

As all clients of a CouchDB database are (in general terms) free to create
JSON structures of an arbitrary form the problem becomes one of finding
Views that are of interest to potential users of the data.

As an illustration, take a police criminal investigation.
All the thousands of contributors to the accumulation of data on the case
use different nomenclature and forms and languages and computer systems to
urgently collate the data in office of Inspector Jacques Clouseau.

Jacques has a schema in mind for his huge police SQL database, but the 
urgency
of the case, the Press pressure, the masses of international data pouring
in forces his hand. He says,
"As long as you send in the data in JSON format, I will do the rest! just
send it in to this new CouchDB using Futon in any form you wish."

The database grows at an alarming rate.
Jacques goes to his local.ini file and types the magic words,

[native_query_servers]
erlang = {couch_native_process, start_link, []}

and restarts his CouchDB server.

Then he opens the rapidly growing database and in Futon types the Map 
Function
below within the #### lines

###############################################################################
fun({Doc})-> % The view function using The Y-Combinator

%% david m. w. martin 2012 U.K.. david.martin@lymegreen.co.uk
%% use the Y-combinator function for anonymous recursion, thanks Haskell 
Curry!

Y = fun(M)->(fun(X) -> X(X) end)(fun (F) -> M(fun(A) -> (F(F))(A) end) 
end) end,
%-------------------------------------------
Logging_on = false, %% logging is to couch.log

Recurser = fun (F) ->

%start of inner functions,
%they can only pass 1 argument so we use a tuple

fun
%-------------------------------------------------------------------

({logger,Legend_string,Parms}) ->
case Logging_on of
true-> Log(io_lib:format(Legend_string,Parms));
false->ok
end;
%-------------------------------------------------------------------------------
({Term,Path,Q0}) when is_list(Term)->
%F({logger,"state of Q = ~p",[Q0]}),
lists:foreach(fun (Key) ->
case proplists:get_value(Key,Term,null) of

Value when is_tuple(Value)->
Path1 =[Key|Path],
{Stripped_Value}=Value,
F({Stripped_Value,[Key|Path],Q0}); % recursive call to F
Value when is_list(Value)->
lists:foreach(fun ({Element,Index}) ->
Path1=[ [Index] |[ Key | Path]],
F({Element,Path1,Index}) % recursive call to F
end,lists:zip(Value,lists:seq(0,length(Value)-1)));

Value->
Path1=[Key|Path],
Emit(lists:reverse(Path1),1)

end
end, proplists:get_keys(Term));
%-------------------------------------------------------------------
({Element,Path,Queue}) when is_tuple(Element)->
{Stripped_Value}=Element,
F({Stripped_Value,Path,Queue}); % recursive call to F
%-------------------------------------------------------------------
({Element,Path,Queue})->
Emit(lists:reverse(Path),1)

end %end of inner Functions
%-------------------------------------------------------------------------------
end, % end of Recurser
%% this line calls the Y(Recursor) with in intial parameter 
{Doc,[<<"Doc">>],0}
Log(io_lib:format("
## End of Document ####~n~p~n", [ (Y(Recurser))( {Doc,[<<"Doc">>],0} )] ))

end. % end of View fun

################################################################################

and the reduce function

################################################################################
fun(Keys,Values,ReReduce)->
case ReReduce of
true->lists:sum(Values); % it is a ReReduce of Values
false->length(Values) % it is just a reduce of Values
end
end.
################################################################################

The Inspector thanks ben hollis at http://benhollis.net/
for the inspiration to crack JSON using some of his test data below, 
modified
to show objects within arrays and arrays within objects.

Test data in JSON form, your _id and _rev will differ.

{
"_id": "61c3f496b9e4c8dc29b95270d9000370",
"_rev": "5-672ef483f9f4fb1386e38bb691442183",
"test": {
"hey": "guy",
"a_number": 243,
"an_object": {
"whoa": "nuts",
"an_array": [
1.0000000000001,
2,
"thr<h1>ee",
{
"whoa": "nuts",
"an_array": [
1,
3.9999999999,
"thr<h1>ee"
]
}
]
},
"awesome": true,
"bogus": false,
"meaning": null,
"japanese": "明日がある。",
"link": "http://jsonview.com",
"notLink": "http://jsonview.com is great"
}
}

This is the form in which CouchDB stores the Documents on disk
and this is the form on which Erlang works internally on JSON documents

{[{<<"_id">>,<<"61c3f496b9e4c8dc29b95270d9000370">>},
{<<"_rev">>,<<"5-672ef483f9f4fb1386e38bb691442183">>},
{<<"test">>,
{[{<<"hey">>,<<"guy">>},
{<<"a_number">>,243},
{<<"an_object">>,
{[{<<"whoa">>,<<"nuts">>},
{<<"an_array">>,
[1.0000000000001,2,<<"thr<h1>ee">>,
{[{<<"whoa">>,<<"nuts">>},
{<<"an_array">>,[1,3.9999999999,<<"thr<h1>ee">>]}]}]}]}},
{<<"awesome">>,true},
{<<"bogus">>,false},
{<<"meaning">>,null},
{<<"japanese">>,
<<230,152,142,230,151,165,227,129,140,227,129,130,227,130,139,227,128,130>>},
{<<"link">>,<<"http://jsonview.com">>},
{<<"notLink">>,<<"http://jsonview.com is great">>}]}}]}

The Inspector tested the map function and reduce function with differing 
levels
of grouping on the test data.Then the Inspector found all the relevant data
in the now huge database of clues and was able, using the output, to 
construct
Javascript Views that could seek out and collate every last relevant 
piece of
data in the database. The case was solved in record time!

The map function allows deep recursion to be performed on each document 
in a
database and in order to protect the concurrent running and updating of the
database, (clues are still coming in), the gen_server handling the mapping
may take too long resulting in a message

"Error: timeout
{gen_server,call,
[<0.4612.32>,
{prompt,[<<"map_doc">>,......

This happens on slow machines with very complex document structures.

Increase "os_process_timeout 5000 in default.ini to larger values.

The Erlang view server is very powerful, In fact it can access all the
functionality of CouchDB and all the functions of Erlang.
This is with the proviso that it is called on every document in a database,
and can only recurse using the Y-Combinator.

If you only have one dummy document,
you only call the function once and can thus code like

lists:foreach(fun(X)->
Log(io_lib:format("~p: ~p",[X,ets:info(X)]))
end,ets:all()),

inserted after "end, % end of Recurser"

will give a list of elements like

inet_hosts_byaddr: [{compressed,false},
{memory,286},
{owner,<0.16.0>},
{heir,none},
{name,inet_hosts_byaddr},
{size,0},
{node,'rcouch@127.0.0.1'},
{named_table,true},
{type,bag},
{keypos,3},
{protection,protected}]

ac_tab: [{compressed,false},
{memory,8265},
{owner,<0.7.0>},
{heir,none},
{name,ac_tab},
{size,127},
{node,'rcouch@127.0.0.1'},
{named_table,true},
{type,set},
{keypos,1},
{protection,public}] ....

of (ets:info) on (ets:all) i.e. all ets tables in CouchDB

The Inspector says, "It would be nice for Erlang Views to have their
own ETS table in this list, instead of grabbing {protection,public}
ones for transient use!"

I conclude by saying what Robert Newson would wish me to say,
"This is for your education in Erlang and CouchDB and should not be used 
in any
mission critical applications. The Erlang View Server is not sandboxed,
and carefully crafted functions can damage your whole life as you know it".
"As long as you trust everyone that can update design documents, that is 
fine"

I hope this is of some use to ESAK's (Earnest Seekers After Knowledge)
into the mysteries of Erlang and its use in CouchDB wherever they may be.

Any improvements, questions or comments will be gratefully received,

David M. W. Martin (davidoccam)




Mime
View raw message