On Aug 10, 2009, at 1:19 PM, Brian Candler wrote: > On Mon, Aug 10, 2009 at 01:05:42PM -0700, Tommy Chheng wrote: >> It is a Ruby app using Couchrest(which uses restclient/net ruby lib) >> >> I'm basically comparing one document against all other documents(+30K >> documents in the dataset; so it's huge number of connections if the >> connections aren't being closed properly) like this: >> grants = NsfGrant.all.paginate(:page => current_page, :per_page => >> page_size) >> grants.each do |doc2| >> NsfGrantSimilarity.compute_and_store(doc1, doc2) > > But presumably NsfGrant.all only makes a single HTTP request, not 30K > separate requests? NsfGrant.all will make one query(per paginated result) but I make another query PER document to get a document's word count list(via a view) in the NsfGrantSimilarity.compute_and_store method. so it will be trying to do 30k separate requests. > Looking at "netstat -n" will give you a rough idea, at > least for seeing how many sockets are left in TIME_WAIT state, but the > surest way is with tcpdump: > > tcpdump -i lo -n -s0 'host 127.0.0.1 and tcp dst port 5984 and > (tcp[tcpflags] & tcp-syn != 0)' > > should show you one line for each new HTTP connection made to CouchDB. it'll show 13 lines of this: 20:29:03.255746 IP 127.0.0.1.58119 > 127.0.0.1.5984: S 3662357700:3662357700(0) win 32792 failing on the client side with Errno::ECONNREFUSED: Connection refused - connect(2) from /usr/lib/ruby/1.8/net/http.rb:560:in `initialize from /usr/lib/ruby/1.8/net/http.rb:560:in `open' from /usr/lib/ruby/1.8/net/http.rb:560:in `connect' from /usr/lib/ruby/1.8/timeout.rb:53:in `timeout'' > > But in any case, for parsing 30K documents, you may not want to load > all 30K > into RAM and then compare then afterwards. Couchrest lets you do a > streaming > view, so that one object is read at a time - I think if you call > view with a > block, then it works this way automatically. You need to have curl > installed > for this to work, as it shells out a separate curl process and then > reads > the response one line at a time. Thanks, i'll have to try this approach.