Return-Path: X-Original-To: apmail-couchdb-user-archive@www.apache.org Delivered-To: apmail-couchdb-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 4E25767D1 for ; Fri, 3 Jun 2011 14:04:02 +0000 (UTC) Received: (qmail 23811 invoked by uid 500); 3 Jun 2011 14:04:00 -0000 Delivered-To: apmail-couchdb-user-archive@couchdb.apache.org Received: (qmail 23778 invoked by uid 500); 3 Jun 2011 14:04:00 -0000 Mailing-List: contact user-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@couchdb.apache.org Delivered-To: mailing list user@couchdb.apache.org Received: (qmail 23770 invoked by uid 99); 3 Jun 2011 14:04:00 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 03 Jun 2011 14:04:00 +0000 X-ASF-Spam-Status: No, hits=0.7 required=5.0 tests=SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [80.244.253.218] (HELO mail.traeumt.net) (80.244.253.218) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 03 Jun 2011 14:03:53 +0000 Received: from dahlia.local (p5797ADB8.dip.t-dialin.net [87.151.173.184]) (using TLSv1 with cipher AES128-SHA (128/128 bits)) (No client certificate requested) by mail.traeumt.net (Postfix) with ESMTPSA id 87F743C31A for ; Fri, 3 Jun 2011 16:03:31 +0200 (CEST) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Apple Message framework v1084) Subject: Re: Tracking file throughput? From: Jan Lehnardt In-Reply-To: Date: Fri, 3 Jun 2011 16:03:30 +0200 Content-Transfer-Encoding: 7bit Message-Id: <4A67FD8D-BB49-4EE2-A668-EB68EDD031AC@apache.org> References: To: user@couchdb.apache.org X-Mailer: Apple Mail (2.1084) Hi, On 3 Jun 2011, at 15:43, muji wrote: > I'm still new to couchdb and nosql so apologies if the answer to this > is trivial. No worries, we're all new at something :) > > I'm trying to track the throughput of a file sent via a POST request > in a couchdb document. > > My initial implementation creates a document for the file before the > POST is sent and then I have an update handler that increments the > "uploadbytes" for every chunk of data received from the client. Could you make that little less frequent in interpolate between the data points? Instead of tracking bytes exactly at the chunk boundaries, just update every 10 or so MB? And have the UI adjust accordingly? > This *nearly* works except that I get document update conflicts (which > I think is to do with me not being able to throttle back the upload > while the db is updated) but the main problem is that for large files > (~2.4GB) the number of document revisions is around 40-50,000. So I > have a single document taking up between 0.7GB and 1GB. After > compaction if reduces to ~380KB which of course is much better but > this still seems excessive and poses problems with compacting to a > write heavy database. I understand the trick to that is to replicate, > compact and replicate back to the source, please correct me if I'm > wrong... Hm no that won't do anything, just regular compaction is good enough. > So, I don't think this approach is viable which makes me wonder > whether setting the _revs_limit will help, although I understand that > setting this per database still requires compaction and will save on > space after compaction. _revs_limit won't help, you will always need to compact to get rid of data. > I was thinking that tracking the throughput as chunks in individual > documents and then calculating the throughput with a map/reduce on all > the chunks might be a better approach. Although I'm concerned that > having lots of little documents for each data chunk will also take up > large amounts of space... Yeah, wouldn't save any space here. That said, the numbers you quote, I wouldn't call "large amounts". > Any advice and guidance on the best way to tackle this would be much > appreciated. I'd either set up continuous compaction (restart compaction right when it is done) to keep DB size at a minimum or use an in-memory store to keep track of the uploaded bytes. Ideally though, CouchDB would give you an endpoint to query that kind of data. Cheers Jan --