couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Simon Metson <si...@cloudant.com>
Subject Re: CouchDB compaction not catching up.
Date Thu, 07 Mar 2013 08:58:55 GMT
What about making a database per day/week and dropping the whole lot in one go? 


On Thursday, 7 March 2013 at 08:50, Nicolas Peeters wrote:

> So the use case is some kind of transactional log associated with some kind
> of long running process (1 day). For each process, a few 100 thousands
> lines of "logging" are inserted. When the process has completed (user
> approval), we would like to delete all the associated "logs". Marking items
> as deleted is not really the issue. Recovering the space is.
> 
> The data should ideally be available for up to a week or so.
> 
> 
> On Thu, Mar 7, 2013 at 9:24 AM, Riyad Kalla <rkalla@gmail.com> wrote:
> 
> > Nicolas,
> > Can you provide some insight into how you decide which large batches of
> > records to delete and roughly how big (MB/GB wise) those batches are? What
> > is the required longevity of this tx information in this couch store? Is
> > this just temporary storage or is this the system of record and what you
> > are deleting in large batches are just temporary intermediary data?
> > 
> > Understanding how you are using the data and turning over the data could
> > help assess some alternative strategies.
> > 
> > Best,
> > Riyad
> > 
> > On Thu, Mar 7, 2013 at 12:19 AM, Nicolas Peeters <nicolists@gmail.com
> > > wrote:
> > 
> > 
> > > Hi CouchDB Users,
> > > 
> > > *Disclaimer: I'm very aware that the use case is definitely not the best
> > > for CouchDB, but for now, we have to deal with it.*
> > > 
> > > *Scenario:*
> > > 
> > > We have a fairly large (~750Gb) CouchDB (1.2.0) database that is being
> > > used for transactional logs (very write heavy) (bad idea/design, I know,
> > > but that's besides the point of this question - we're looking at
> > > alternative designs). Once in a while, we delete some of the records in
> > > large batches and we have scheduled auto compaction, checking every 2
> > > hours.
> > > 
> > > This is the compaction config:
> > > 
> > > [image: Inline image 1]
> > > 
> > > From what I can see, the DB is being hammered significantly every 12
> > hours
> > > and the compaction is taking (sometimes 24 hours (with a size of 100GB of
> > > log data, sometimes much more (up to 500GB)).
> > > 
> > > We run on EC2. Large instances with EBS. No striping (yet), no IOPS. We
> > > tried fatter machines, but the improvement was really minimal.
> > > 
> > > **
> > > 
> > > *The problem:*
> > > 
> > > The problem is that compaction takes a very long time (e.g. 12h+) and
> > > reduces the performance of the entire stack. The main issue seems to be
> > > that it's hard for the compaction process to "keep up" with the
> > > 
> > 
> > insertions,
> > > hence why it takes so long. Also, the compaction of the view takes long
> > > time (sometimes the view is 100GB). During the re-compaction of the view,
> > > clients don't get a response, which is blocking the processes.
> > > 
> > > [image: Inline image 2]
> > > 
> > > The view compaction takes approx. 8 hours and the indexing for the view
> > > are therefore slower and during the time that view indexes, another 300k
> > > insertions have been done (and it doesn't catch up). The only way to
> > > 
> > 
> > solve
> > > the problem was to throttle the number of inserts from the app itself and
> > > then eventually the view compaction resolved. If we would have continued
> > > 
> > 
> > to
> > > insert at the same rate, it would not have finished (and ultimately, we
> > > would have run out of disk space).
> > > 
> > > Any recommendations to set it up on EC2 is welcome. Also configuration
> > > settings for the compaction would be helpful.
> > > 
> > > Thanks.
> > > 
> > > Nicolas
> > > 
> > > PS: We are happily using CouchDB for other (more traditional) use case
> > > where it does go very well.
> > > 
> > 
> > 
> 
> 
> 



Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message