incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jan Lehnardt <...@apache.org>
Subject Re: Tracking file throughput?
Date Fri, 03 Jun 2011 14:37:23 GMT

On 3 Jun 2011, at 16:28, muji wrote:

> Thanks very much for the help.
> 
> I could of course reduce the amount of times the update is done but
> the service plans to bill based on throughput so this is quite
> critical from a billing perspective.

You can still bill on throughput as you will know exactly how much
date has been transferred in what amount of time, but reporting is
going to be less granular, i.e. chunks of say 10MB and not 100Kb or
however big chunks are.

> A quick search for continuous compaction didn't yield anything, and I
> don't see anything here:
> 
> http://wiki.apache.org/couchdb/Compaction
> 
> Could you point me in the right direction please?

I made it up and I explained how to do it. Pseduocode:

while(`curl http://127.0.0.1:5984/db/_compact`);

> Funny you mention about caching before updating couch, that was my
> very first implementation! I was updating Redis with the throughput
> and then updating the file document once the upload completed. That
> worked very well but I wanted to remove Redis from the stack as the
> application is already pretty complex.
> 
> I'm guessing my best option is to revert back to that technique?

It depends on what your goals are. The initial design you mentioned
seems fine to me if you compact often. If you are optimising for
disk space, Redis or memcached may be a good idea. If you are
optimising for a small stack, not having Redis or memcached is a
good idea.

> As an aside, why would my document update handler be raising
> conflicts? My understanding was that update handlers would not raise
> conflicts - is that correct?

That is not correct.

Cheers
Jan
-- 

> 
> Thanks!
> 
> On Fri, Jun 3, 2011 at 3:03 PM, Jan Lehnardt <jan@apache.org> wrote:
>> Hi,
>> 
>> On 3 Jun 2011, at 15:43, muji wrote:
>>> I'm still new to couchdb and nosql so apologies if the answer to this
>>> is trivial.
>> 
>> No worries, we're all new at something :)
>> 
>>> 
>>> I'm trying to track the throughput of a file sent via a POST request
>>> in a couchdb document.
>>> 
>>> My initial implementation creates a document for the file before the
>>> POST is sent and then I have an update handler that increments the
>>> "uploadbytes" for every chunk of data received from the client.
>> 
>> Could you make that little less frequent in interpolate between the
>> data points? Instead of tracking bytes exactly at the chunk boundaries,
>> just update every 10 or so MB? And have the UI adjust accordingly?
>> 
>> 
>>> This *nearly* works except that I get document update conflicts (which
>>> I think is to do with me not being able to throttle back the upload
>>> while the db is updated) but the main problem is that for large files
>>> (~2.4GB) the number of document revisions is around 40-50,000. So I
>>> have a single document taking up between 0.7GB and 1GB. After
>>> compaction if reduces to ~380KB which of course is much better but
>>> this still seems excessive and poses problems with compacting to a
>>> write heavy database. I understand the trick to that is to replicate,
>>> compact and replicate back to the source, please correct me if I'm
>>> wrong...
>> 
>> Hm no that won't do anything, just regular compaction is good enough.
>> 
>>> So, I don't think this approach is viable which makes me wonder
>>> whether setting the _revs_limit will help, although I understand that
>>> setting this per database still requires compaction and will save on
>>> space after compaction.
>> 
>> _revs_limit won't help, you will always need to compact to get rid of
>> data.
>> 
>>> I was thinking that tracking the throughput as chunks in individual
>>> documents and then calculating the throughput with a map/reduce on all
>>> the chunks might be a better approach. Although I'm concerned that
>>> having lots of little documents for each data chunk will also take up
>>> large amounts of space...
>> 
>> Yeah, wouldn't save any space here. That said, the numbers you quote,
>> I wouldn't call "large amounts".
>> 
>> 
>>> Any advice and guidance on the best way to tackle this would be much
>>> appreciated.
>> 
>> I'd either set up continuous compaction (restart compaction right when
>> it is done) to keep DB size at a minimum or use an in-memory store
>> to keep track of the uploaded bytes.
>> 
>> Ideally though, CouchDB would give you an endpoint to query that kind
>> of data.
>> 
>> Cheers
>> Jan
>> --
>> 
>> 
> 
> 
> 
> -- 
> muji.


Mime
View raw message