couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jan Lehnardt <...@apache.org>
Subject Re: Tracking file throughput?
Date Fri, 03 Jun 2011 15:02:27 GMT

On 3 Jun 2011, at 17:00, muji wrote:

> Thanks again for your help Jan.
> 
> Sorry, I thought that continuous compaction might be a feature I had
> overlooked. I have no problems automating a compaction process, I
> always envisaged needing to do that...
> 
> I think that I will revert to running far fewer updates on the couchdb
> document and caching the throughput in Redis as disc space is more of
> a priority than application complexity.
> 
> A few more (different) questions in the pipeline as I'm still learning couch ;)

Sure, any time :)

Cheers
Jan
-- 

> 
> On Fri, Jun 3, 2011 at 3:37 PM, Jan Lehnardt <jan@apache.org> wrote:
>> 
>> On 3 Jun 2011, at 16:28, muji wrote:
>> 
>>> Thanks very much for the help.
>>> 
>>> I could of course reduce the amount of times the update is done but
>>> the service plans to bill based on throughput so this is quite
>>> critical from a billing perspective.
>> 
>> You can still bill on throughput as you will know exactly how much
>> date has been transferred in what amount of time, but reporting is
>> going to be less granular, i.e. chunks of say 10MB and not 100Kb or
>> however big chunks are.
>> 
>>> A quick search for continuous compaction didn't yield anything, and I
>>> don't see anything here:
>>> 
>>> http://wiki.apache.org/couchdb/Compaction
>>> 
>>> Could you point me in the right direction please?
>> 
>> I made it up and I explained how to do it. Pseduocode:
>> 
>> while(`curl http://127.0.0.1:5984/db/_compact`);
>> 
>>> Funny you mention about caching before updating couch, that was my
>>> very first implementation! I was updating Redis with the throughput
>>> and then updating the file document once the upload completed. That
>>> worked very well but I wanted to remove Redis from the stack as the
>>> application is already pretty complex.
>>> 
>>> I'm guessing my best option is to revert back to that technique?
>> 
>> It depends on what your goals are. The initial design you mentioned
>> seems fine to me if you compact often. If you are optimising for
>> disk space, Redis or memcached may be a good idea. If you are
>> optimising for a small stack, not having Redis or memcached is a
>> good idea.
>> 
>>> As an aside, why would my document update handler be raising
>>> conflicts? My understanding was that update handlers would not raise
>>> conflicts - is that correct?
>> 
>> That is not correct.
>> 
>> Cheers
>> Jan
>> --
>> 
>>> 
>>> Thanks!
>>> 
>>> On Fri, Jun 3, 2011 at 3:03 PM, Jan Lehnardt <jan@apache.org> wrote:
>>>> Hi,
>>>> 
>>>> On 3 Jun 2011, at 15:43, muji wrote:
>>>>> I'm still new to couchdb and nosql so apologies if the answer to this
>>>>> is trivial.
>>>> 
>>>> No worries, we're all new at something :)
>>>> 
>>>>> 
>>>>> I'm trying to track the throughput of a file sent via a POST request
>>>>> in a couchdb document.
>>>>> 
>>>>> My initial implementation creates a document for the file before the
>>>>> POST is sent and then I have an update handler that increments the
>>>>> "uploadbytes" for every chunk of data received from the client.
>>>> 
>>>> Could you make that little less frequent in interpolate between the
>>>> data points? Instead of tracking bytes exactly at the chunk boundaries,
>>>> just update every 10 or so MB? And have the UI adjust accordingly?
>>>> 
>>>> 
>>>>> This *nearly* works except that I get document update conflicts (which
>>>>> I think is to do with me not being able to throttle back the upload
>>>>> while the db is updated) but the main problem is that for large files
>>>>> (~2.4GB) the number of document revisions is around 40-50,000. So I
>>>>> have a single document taking up between 0.7GB and 1GB. After
>>>>> compaction if reduces to ~380KB which of course is much better but
>>>>> this still seems excessive and poses problems with compacting to a
>>>>> write heavy database. I understand the trick to that is to replicate,
>>>>> compact and replicate back to the source, please correct me if I'm
>>>>> wrong...
>>>> 
>>>> Hm no that won't do anything, just regular compaction is good enough.
>>>> 
>>>>> So, I don't think this approach is viable which makes me wonder
>>>>> whether setting the _revs_limit will help, although I understand that
>>>>> setting this per database still requires compaction and will save on
>>>>> space after compaction.
>>>> 
>>>> _revs_limit won't help, you will always need to compact to get rid of
>>>> data.
>>>> 
>>>>> I was thinking that tracking the throughput as chunks in individual
>>>>> documents and then calculating the throughput with a map/reduce on all
>>>>> the chunks might be a better approach. Although I'm concerned that
>>>>> having lots of little documents for each data chunk will also take up
>>>>> large amounts of space...
>>>> 
>>>> Yeah, wouldn't save any space here. That said, the numbers you quote,
>>>> I wouldn't call "large amounts".
>>>> 
>>>> 
>>>>> Any advice and guidance on the best way to tackle this would be much
>>>>> appreciated.
>>>> 
>>>> I'd either set up continuous compaction (restart compaction right when
>>>> it is done) to keep DB size at a minimum or use an in-memory store
>>>> to keep track of the uploaded bytes.
>>>> 
>>>> Ideally though, CouchDB would give you an endpoint to query that kind
>>>> of data.
>>>> 
>>>> Cheers
>>>> Jan
>>>> --
>>>> 
>>>> 
>>> 
>>> 
>>> 
>>> --
>>> muji.
>> 
>> 
> 
> 
> 
> -- 
> muji.


Mime
View raw message