> We'd love to hear what you come up with and also to solve any
> problems you might encounter on your way. Please let us know. Please
> note that CouchDB at this point is not optimised. We are still in
> the 'getting it right' phase before we come to the 'getting it
> fast'. That said, CouchDB is plenty fast already, but there is also
> the potential to greatly speed up things.
So I'm trying a smaller version of this first (9 million records), and
I've hit a snag. I have some rather simple python code to read from
Postgres and write to couchdb (that uses couchdb-python, where 'db' is
a couchdb.client.Database object):
chunker = IteratorChunker(get_stuff())
while not chunker.done:
print "fetching"
chunk = chunker.next_chunk(1000)
if chunk:
print "Adding %d items, starting with %s" %
(len(chunk),chunk[0]['_id'])
db.update(chunk)
db.update(docs) (see <http://code.google.com/p/couchdb-python/source/browse/trunk/couchdb/client.py
>, line 360) uses the bulk API, like:
data = self.resource.post('_bulk_docs', content={'docs':
documents})
At apparently random points throughout this process, but almost always
before 15,000 records or so, the process dies with an exception, the
tail end of which looks like:
File "/System/Library/Frameworks/Python.framework/Versions/2.5/lib/
python2.5/httplib.py", line 707, in send
self.sock.sendall(str)
File "<string>", line 1, in sendall
socket.error: (54, 'Connection reset by peer')
If I have Futon up while it's running, I occasionally get a Javascript
error along the lines of "killed" (reproducing it is difficult) at the
same time.
I could have it catch the reset connection and re-try, but why would
this be happening?
|