couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From muji <mis...@freeformsystems.com>
Subject Re: Tracking file throughput?
Date Fri, 03 Jun 2011 14:28:54 GMT
Thanks very much for the help.

I could of course reduce the amount of times the update is done but
the service plans to bill based on throughput so this is quite
critical from a billing perspective.

A quick search for continuous compaction didn't yield anything, and I
don't see anything here:

http://wiki.apache.org/couchdb/Compaction

Could you point me in the right direction please?

Funny you mention about caching before updating couch, that was my
very first implementation! I was updating Redis with the throughput
and then updating the file document once the upload completed. That
worked very well but I wanted to remove Redis from the stack as the
application is already pretty complex.

I'm guessing my best option is to revert back to that technique?

As an aside, why would my document update handler be raising
conflicts? My understanding was that update handlers would not raise
conflicts - is that correct?

Thanks!

On Fri, Jun 3, 2011 at 3:03 PM, Jan Lehnardt <jan@apache.org> wrote:
> Hi,
>
> On 3 Jun 2011, at 15:43, muji wrote:
>> I'm still new to couchdb and nosql so apologies if the answer to this
>> is trivial.
>
> No worries, we're all new at something :)
>
>>
>> I'm trying to track the throughput of a file sent via a POST request
>> in a couchdb document.
>>
>> My initial implementation creates a document for the file before the
>> POST is sent and then I have an update handler that increments the
>> "uploadbytes" for every chunk of data received from the client.
>
> Could you make that little less frequent in interpolate between the
> data points? Instead of tracking bytes exactly at the chunk boundaries,
> just update every 10 or so MB? And have the UI adjust accordingly?
>
>
>> This *nearly* works except that I get document update conflicts (which
>> I think is to do with me not being able to throttle back the upload
>> while the db is updated) but the main problem is that for large files
>> (~2.4GB) the number of document revisions is around 40-50,000. So I
>> have a single document taking up between 0.7GB and 1GB. After
>> compaction if reduces to ~380KB which of course is much better but
>> this still seems excessive and poses problems with compacting to a
>> write heavy database. I understand the trick to that is to replicate,
>> compact and replicate back to the source, please correct me if I'm
>> wrong...
>
> Hm no that won't do anything, just regular compaction is good enough.
>
>> So, I don't think this approach is viable which makes me wonder
>> whether setting the _revs_limit will help, although I understand that
>> setting this per database still requires compaction and will save on
>> space after compaction.
>
> _revs_limit won't help, you will always need to compact to get rid of
> data.
>
>> I was thinking that tracking the throughput as chunks in individual
>> documents and then calculating the throughput with a map/reduce on all
>> the chunks might be a better approach. Although I'm concerned that
>> having lots of little documents for each data chunk will also take up
>> large amounts of space...
>
> Yeah, wouldn't save any space here. That said, the numbers you quote,
> I wouldn't call "large amounts".
>
>
>> Any advice and guidance on the best way to tackle this would be much
>> appreciated.
>
> I'd either set up continuous compaction (restart compaction right when
> it is done) to keep DB size at a minimum or use an in-memory store
> to keep track of the uploaded bytes.
>
> Ideally though, CouchDB would give you an endpoint to query that kind
> of data.
>
> Cheers
> Jan
> --
>
>



-- 
muji.

Mime
View raw message