couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Paul Joseph Davis (JIRA)" <j...@apache.org>
Subject [jira] Resolved: (COUCHDB-567) Erlang View with Reduce Fails on Large Number of documents
Date Thu, 12 Nov 2009 07:39:39 GMT

     [ https://issues.apache.org/jira/browse/COUCHDB-567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Paul Joseph Davis resolved COUCHDB-567.
---------------------------------------

       Resolution: Fixed
    Fix Version/s: 0.10.1
         Assignee: Paul Joseph Davis

Fixed as of 835281.

Sean, can you try this on for size? This is only in the 0.10.x branch so you'll have to pull
that form SVN and build. The 0.10.1 release will be in a day or two so this is mostly just
a remote check that it's fixed.

Too tired to write anything of interest, so here's the commit message:

Fixes COUCHDB-567 error with ErlView reduces.

Apparently we never tested ErlView reductions on 0.10.x? As far as I can tell   they never
should have worked. It was exactly as Sean Geoghegan described in    that the interleaved
calls to reduce where trouncing the mapper state.

Trunk uses the two process update scheme so wouldn't be affected by this        trouncing.
This patch is a stop gap to make ErlViews work. I've tested with the Futon test patch and
an updated query_server_spec.rb to revalidate things are working.

Fixing this bug has made it quite apparent that the query server specs need to be drastically
rethought. I spent quite a bit of time tracking down that the specs were actually testing
that the subprocess died. ErlViews causing the host process to die would be very bad. In the
future I'd like to see script/response sets and file stuctures for function definitions.


> Erlang View with Reduce Fails on Large Number of documents
> ----------------------------------------------------------
>
>                 Key: COUCHDB-567
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-567
>             Project: CouchDB
>          Issue Type: Bug
>    Affects Versions: 0.10
>            Reporter: Sean Geoghegan
>            Assignee: Paul Joseph Davis
>             Fix For: 0.10.1
>
>         Attachments: generate-data.rb, view.erl
>
>
> I have been having a problem with running Erlang views over a large dataset.  Whenever
the indexer goes to checkpoint it's process the following error occurs:
> ** Last message in was {'EXIT',<0.2220.0>,
>                         {function_clause,
>                          [{couch_view_updater,view_insert_doc_query_results,
>                            [{doc,<<"73956fdca62c384849a3313e6c48b7ed">>,...
>                            [],
>                            [{{view,0,
>                                  [<<"_temp">>],
>                                  <<"...">>,
>                                  {btree,<0.2218.0>,
>                                      {1565615,{341,[0]}},
>                                      #Fun<couch_btree.3.83553141>,
>                                      #Fun<couch_btree.4.30790806>,
>                                      #Fun<couch_view.less_json_keys.2>,
>                                      #Fun<couch_view_group.11.46347864>},
>                                  [{<<"_temp">>,
>                                    <<"...">>}]},
>                              []}],
>                            [],[]]},
>                       {couch_view_updater,view_insert_query_results,4},
>                       {couch_view_updater,process_doc,4},
>                       {couch_view_updater,'-update/2-fun-0-',6},
>                       {couch_btree,stream_kv_node2,7},
>                       {couch_btree,stream_kp_node,6},
>                       {couch_btree,fold,5},
>                       {couch_view_updater,update,2}]]},
> This problem occurs regardless of the functionality of the map and reduce functions,
it seems to based on the time it takes to generate, or whatever causes the checkpoints to
get written out.
> I did some investigation into the problem by adding alot of LOG_INFO statements throughout
the code.  I was able to determine the following:
>   
>    * the Erlang View process is being held on to by the view updater for the entire duration
of the indexing, 
>    * however after the first checkpoint is hit and the progress is written out, a reduce
call is made to the erlang view server, once this completes the view server is released back
to the cache using ret_os_process. 
>    * when the next reduce cycle occurs the same erlang view server is returned by get_os_process
but it is first sent a reset message which clears all the functions in the view servers state.
>    * when the next map cycles starts the view updater uses the same handle to the erlang
view server it had in the beginning. It assumes that the servers state is the same however
it has been reset so there are no view functions in the view server.  This causes the above
error when it then attempts to write out the result of a view function which doesn't exist
in the server.
> I was able to fix this problem by modifying line 139 of couch_view_updater.erl from this:
>    {[], Group2, ViewEmptyKeyValues, []}
> to this:
>  {[], Group2#group{query_server=nil}, ViewEmptyKeyValues, []}
> Which removes the view updater's handle to the erlang server proc, forcing it to get/create
a new one for each map cycle and setting up the view functions within the server.  I don't
know if this is the right way to do it, or if it has any bad side-effects, but it does prevent
the crash at least, and allow the indexing to complete correctly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message