couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From James Marca <jma...@translab.its.uci.edu>
Subject An old database causes couchdb 1.3.0 to crash getting single documents
Date Mon, 29 Apr 2013 06:27:47 GMT
Hello list,

I have an old database that I've carried along for a year or so now
through various upgrades, and although it has given me problems in the
past, I've never really dealt with them, but am trying to do so now.

I wrote the docs into the db and used them with couchdb 0.9 (more or
less).  I finished the analysis a while ago, but now I am trying to
clean up dbs and close up this project, but was crashing couch when
accessing docs and trying to rebuild views (Apparently 1.3.0 requires
a view rebuild).  

So I am trying to fix this db once and for all by fetching each
document and writing into a new couchdb.

I have two problems.  

First, for reasons I do not understand at all, many of the docs in
this db are corrupt, or just plain old too big.  This DB was the one
that bore the brunt of my experimentations with big docs in couchdb,
and I might have written really big ones into it.  Anyway, what
happens is that the GET will cause CouchDB to fill up RAM and die.

My second problem is to wonder whether there is a way to short circuit
that slow death, as it takes forever and I have over 10 million
documents to process.  

What I am doing now:

I *can* get all of the doc ids via the all_docs interface, as long as
I do not ask for the document contents.  So what I am doing is getting
a batch of 1000 doc ids, fetching each one, and when I hit a bad one
that causes couchdb to die, I wait 10 seconds for couchdb to restart,
note the bad doc, and move on to the next one

I looked through the configuration settings, and I don't see an
obvious way to tell CouchDB to abort if RAM exceeds some pre-set
limit, nor do I see a way to tell couch to abort a request if it is
taking too long.

My requests are plain gets, as in

curl http://127.0.0.1/my%2fbroken%2fdb/00016ed321e51ef4b89db5f690c92c4367728b18f1298f179a3241f1de075bde

(I'm actually using node.js to get the docs, but the crash will also
happen in curl or via a browser)


Sometimes I will see a **really** long dump in the error logs, and
then 

                                    [{file,"couch_compress.erl"},{line,67}]},
                                {couch_file,pread_term,2,
                                    [{file,"couch_file.erl"},{line,135}]},
                                {couch_db,make_doc,5,
                                    [{file,"couch_db.erl"},{line,1264}]},
                                {couch_db,open_doc_int,3,
                                    [{file,"couch_db.erl"},{line,1203}]},
                                {couch_db,open_doc,3,
                                    [{file,"couch_db.erl"},{line,141}]},
                                {couch_httpd_db,couch_doc_open,4,
                                    [{file,"couch_httpd_db.erl"},{line,802}]},
                                {couch_httpd_db,db_doc_req,3,
                                    [{file,"couch_httpd_db.erl"},{line,498}]},
                                {couch_httpd_db,do_db_req,2,
                                    [{file,"couch_httpd_db.erl"},{line,234}]}]
[Mon, 29 Apr 2013 05:53:51 GMT] [info] [<0.262.0>] 127.0.0.1 - -
                                    GET
                                    /my%2fbroken%2fdb/00017ea256817bb6c43474a8d8eb12d141425c3e7b51fe6ef05729b08c41b91a
                                    500
[Mon, 29 Apr 2013 05:53:52 GMT] [error] [<0.262.0>] httpd 500 error
                                    response:
 {"error":"unknown_error","reason":"function_clause"}

and couchdb does NOT crash.

Most of the time the logs just say:  

[Mon, 29 Apr 2013 05:54:43 GMT] [error] [<0.141.0>] function_clause error in HTTP request

and then it crashes, and then next entries in the log file are
restarts of all the replications, etc as per usual start up
procedures.

When the docs work I can process 10 to 20 every second.  If I could
temporarily tell couchdb to abort request that take more than a
second, that would do the trick, but I can't see how to do that.

If I could ask couch how big a document is prior to processing it, I
could skip processing the really big ones, but I can't see how to do
that.

I've thought about compacting the db under 1.3.0.  This *might* fix
the problem, but in the worst case I will wait around a day or so for
the compaction to finish, and then find that I still have a problem
when couchdb to send a document out.

Any advice on hacking a solution or tweaking a parameter setting would
be very much appreciated, as I have 12,970,464 documents to get and
save (102.4 GB) and waiting a minute or so every time I hit a bad doc
is just taking too long.

Finally, I *was* able to use my views to access data under 1.2.x.  My
main data view basically broke up each document into a ton of emit()s,
and I only grabbed the view output, never the original docs.  But when
I upgraded to 1.3.0, all the views needed a rebuild.  Other, similar
DBs worked fine, but this one crashed CouchDB every time I tried.


Regards,
James Marca

Mime
View raw message