Return-Path: X-Original-To: apmail-couchdb-user-archive@www.apache.org Delivered-To: apmail-couchdb-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 0C4EB6BA5 for ; Fri, 3 Jun 2011 15:00:50 +0000 (UTC) Received: (qmail 2501 invoked by uid 500); 3 Jun 2011 15:00:48 -0000 Delivered-To: apmail-couchdb-user-archive@couchdb.apache.org Received: (qmail 2454 invoked by uid 500); 3 Jun 2011 15:00:48 -0000 Mailing-List: contact user-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@couchdb.apache.org Delivered-To: mailing list user@couchdb.apache.org Received: (qmail 2446 invoked by uid 99); 3 Jun 2011 15:00:48 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 03 Jun 2011 15:00:48 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=FREEMAIL_FROM,NORMAL_HTTP_TO_IP,RCVD_IN_DNSWL_LOW,RFC_ABUSE_POST,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL,WEIRD_PORT X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of freeformsystems@gmail.com designates 209.85.160.52 as permitted sender) Received: from [209.85.160.52] (HELO mail-pw0-f52.google.com) (209.85.160.52) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 03 Jun 2011 15:00:41 +0000 Received: by pwi4 with SMTP id 4so1377273pwi.11 for ; Fri, 03 Jun 2011 08:00:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:content-type; bh=DWN29TU4VXgtDSwTRIU/FTw4LKURlqRCE2LiXV5H1FQ=; b=x5WRAMeGiM7qMPmyVu/vezfdyPMrLObLrxex0IqDBxgD2lXWN1bK5BG+We4UwEPmn1 tNQrHRjVKbSgfQkMc4MGMqYfPganyp715LL45w96Yw3Wk2v3+6fQcFwqtPX1no9ch4yx gyQtXJ9qw66Ux5M4YDVppaek4E02ciN2D6mSM= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:content-type; b=mvG95dnq8/DLto3isgJ+6Xxj1xQVZV/DoCJUdftAuDLULWINOLvB8+5fZKMaLT1Rab D4SDegunxrB3Vn/A/ReE3MoIke0QedhbH4IhF1eq8TxWytvAlIInKXcIHci3bmksAw/C hl5zXPk29B04KzoeAj9GHjQQaqDF31BmwAHc8= MIME-Version: 1.0 Received: by 10.68.38.163 with SMTP id h3mr864232pbk.196.1307113221505; Fri, 03 Jun 2011 08:00:21 -0700 (PDT) Sender: freeformsystems@gmail.com Received: by 10.68.65.135 with HTTP; Fri, 3 Jun 2011 08:00:21 -0700 (PDT) In-Reply-To: <489F0815-89BC-4011-82C3-133A8DBB2323@apache.org> References: <4A67FD8D-BB49-4EE2-A668-EB68EDD031AC@apache.org> <489F0815-89BC-4011-82C3-133A8DBB2323@apache.org> Date: Fri, 3 Jun 2011 16:00:21 +0100 X-Google-Sender-Auth: Hnh-2R3v1Wk0h7MwgG4xbPogMVA Message-ID: Subject: Re: Tracking file throughput? From: muji To: user@couchdb.apache.org Content-Type: text/plain; charset=UTF-8 Thanks again for your help Jan. Sorry, I thought that continuous compaction might be a feature I had overlooked. I have no problems automating a compaction process, I always envisaged needing to do that... I think that I will revert to running far fewer updates on the couchdb document and caching the throughput in Redis as disc space is more of a priority than application complexity. A few more (different) questions in the pipeline as I'm still learning couch ;) On Fri, Jun 3, 2011 at 3:37 PM, Jan Lehnardt wrote: > > On 3 Jun 2011, at 16:28, muji wrote: > >> Thanks very much for the help. >> >> I could of course reduce the amount of times the update is done but >> the service plans to bill based on throughput so this is quite >> critical from a billing perspective. > > You can still bill on throughput as you will know exactly how much > date has been transferred in what amount of time, but reporting is > going to be less granular, i.e. chunks of say 10MB and not 100Kb or > however big chunks are. > >> A quick search for continuous compaction didn't yield anything, and I >> don't see anything here: >> >> http://wiki.apache.org/couchdb/Compaction >> >> Could you point me in the right direction please? > > I made it up and I explained how to do it. Pseduocode: > > while(`curl http://127.0.0.1:5984/db/_compact`); > >> Funny you mention about caching before updating couch, that was my >> very first implementation! I was updating Redis with the throughput >> and then updating the file document once the upload completed. That >> worked very well but I wanted to remove Redis from the stack as the >> application is already pretty complex. >> >> I'm guessing my best option is to revert back to that technique? > > It depends on what your goals are. The initial design you mentioned > seems fine to me if you compact often. If you are optimising for > disk space, Redis or memcached may be a good idea. If you are > optimising for a small stack, not having Redis or memcached is a > good idea. > >> As an aside, why would my document update handler be raising >> conflicts? My understanding was that update handlers would not raise >> conflicts - is that correct? > > That is not correct. > > Cheers > Jan > -- > >> >> Thanks! >> >> On Fri, Jun 3, 2011 at 3:03 PM, Jan Lehnardt wrote: >>> Hi, >>> >>> On 3 Jun 2011, at 15:43, muji wrote: >>>> I'm still new to couchdb and nosql so apologies if the answer to this >>>> is trivial. >>> >>> No worries, we're all new at something :) >>> >>>> >>>> I'm trying to track the throughput of a file sent via a POST request >>>> in a couchdb document. >>>> >>>> My initial implementation creates a document for the file before the >>>> POST is sent and then I have an update handler that increments the >>>> "uploadbytes" for every chunk of data received from the client. >>> >>> Could you make that little less frequent in interpolate between the >>> data points? Instead of tracking bytes exactly at the chunk boundaries, >>> just update every 10 or so MB? And have the UI adjust accordingly? >>> >>> >>>> This *nearly* works except that I get document update conflicts (which >>>> I think is to do with me not being able to throttle back the upload >>>> while the db is updated) but the main problem is that for large files >>>> (~2.4GB) the number of document revisions is around 40-50,000. So I >>>> have a single document taking up between 0.7GB and 1GB. After >>>> compaction if reduces to ~380KB which of course is much better but >>>> this still seems excessive and poses problems with compacting to a >>>> write heavy database. I understand the trick to that is to replicate, >>>> compact and replicate back to the source, please correct me if I'm >>>> wrong... >>> >>> Hm no that won't do anything, just regular compaction is good enough. >>> >>>> So, I don't think this approach is viable which makes me wonder >>>> whether setting the _revs_limit will help, although I understand that >>>> setting this per database still requires compaction and will save on >>>> space after compaction. >>> >>> _revs_limit won't help, you will always need to compact to get rid of >>> data. >>> >>>> I was thinking that tracking the throughput as chunks in individual >>>> documents and then calculating the throughput with a map/reduce on all >>>> the chunks might be a better approach. Although I'm concerned that >>>> having lots of little documents for each data chunk will also take up >>>> large amounts of space... >>> >>> Yeah, wouldn't save any space here. That said, the numbers you quote, >>> I wouldn't call "large amounts". >>> >>> >>>> Any advice and guidance on the best way to tackle this would be much >>>> appreciated. >>> >>> I'd either set up continuous compaction (restart compaction right when >>> it is done) to keep DB size at a minimum or use an in-memory store >>> to keep track of the uploaded bytes. >>> >>> Ideally though, CouchDB would give you an endpoint to query that kind >>> of data. >>> >>> Cheers >>> Jan >>> -- >>> >>> >> >> >> >> -- >> muji. > > -- muji.