incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nicolas Peeters <nicoli...@gmail.com>
Subject Re: CouchDB compaction not catching up.
Date Thu, 07 Mar 2013 21:47:19 GMT
See my answers in the text. I know there are all kinds of workarounds
possible and it seems that this is actually not such a big problem for all
other users.
Maybe this "extreme" case warrants more practical workarounds indeed.

On Thu, Mar 7, 2013 at 4:12 PM, Riyad Kalla <rkalla@gmail.com> wrote:

> To Simon's point, exactly where I was headed. Your issue is that
> compaction cannot catch up due to write velocity, so you need to avoid
> compaction (and by extension replication since the issue is that your
> background writes cannot catch up) The only way to do that is some
> working model where you simple discard the data file when done and
> start anew.
>
>
Indeed. Unless the file actually gets so big that you can't possibly do
anything. But then again, maybe a design issue in the amount of stuff being
logged.


> You mentioned clearing a few 100 records at a time after a tx
> completes, so it sounds like over the period of a week, you should be
> turning over your entire data set completely right?
>

Typically, yes.

>
> I wonder there could be a solution here like fronting a few CouchDB
> instances with nginx and using a cron job, on day 5 or 7, flipping
> inbound traffic to a hot (empty standby) while processing the
> remaining data off the old master an then clearing it out which writes
> are directed to the new master for the next week?
>

Wow. That's an impressive workaround but that would work indeed. I'd prefer
using standards features (that can also be easily driven by a web app or
something (which is the case)).

Again, this only makes sense depending on data usage and if the
> pending data off the slave would need to stay accessible to a front
> end like search. Ultimately what I am suggesting here is a solution
> where you always have a CouchDB instance to write logs to, but you are
> never trying to compact which would require some clever juggling
> between instances.
>
> Alternatively... Your problem is write performance, I would be curious
> if IOPS instances would cure this for you right out of the box with no
> engineering work.
>
> Longer term? Probably check out aws redline.
>

At the moment, we're looking at alternatives which is to use Logstash and
write either to files and/or stream to ElasticSearch. Delete would be
achieved by deleted in bulk a whole "index" (a bit like the solution
mentioned above). We'll keep CouchDB for the "important" logs and
transactions logs are possibly going to be dealt in a different way.


> Sent from my iPhone
>
> On Mar 7, 2013, at 1:58 AM, Nicolas Peeters <nicolists@gmail.com> wrote:
>
> > Simon,
> >
> > That's actually a very suggestion and we actually implemented that (we
> had
> > one DB per "process"). The problem that the size of the DB sometimes
> > outgrew our disks (1TB!) (and sometimes, we needed to keep the data
> around
> > for longer periods), so we discarded it at the end.
> >
> > This is however a workaround. And the main question was about the
> > compaction not catching up (which may be a problem in some other cases).
> >
> >
> > On Thu, Mar 7, 2013 at 9:58 AM, Simon Metson <simon@cloudant.com> wrote:
> >
> >> What about making a database per day/week and dropping the whole lot in
> >> one go?
> >>
> >>
> >> On Thursday, 7 March 2013 at 08:50, Nicolas Peeters wrote:
> >>
> >>> So the use case is some kind of transactional log associated with some
> >> kind
> >>> of long running process (1 day). For each process, a few 100 thousands
> >>> lines of "logging" are inserted. When the process has completed (user
> >>> approval), we would like to delete all the associated "logs". Marking
> >> items
> >>> as deleted is not really the issue. Recovering the space is.
> >>>
> >>> The data should ideally be available for up to a week or so.
> >>>
> >>>
> >>> On Thu, Mar 7, 2013 at 9:24 AM, Riyad Kalla <rkalla@gmail.com> wrote:
> >>>
> >>>> Nicolas,
> >>>> Can you provide some insight into how you decide which large batches
> of
> >>>> records to delete and roughly how big (MB/GB wise) those batches are?
> >> What
> >>>> is the required longevity of this tx information in this couch store?
> >> Is
> >>>> this just temporary storage or is this the system of record and what
> >> you
> >>>> are deleting in large batches are just temporary intermediary data?
> >>>>
> >>>> Understanding how you are using the data and turning over the data
> >> could
> >>>> help assess some alternative strategies.
> >>>>
> >>>> Best,
> >>>> Riyad
> >>>>
> >>>> On Thu, Mar 7, 2013 at 12:19 AM, Nicolas Peeters <nicolists@gmail.com
> >>>>> wrote:
> >>>>
> >>>>
> >>>>> Hi CouchDB Users,
> >>>>>
> >>>>> *Disclaimer: I'm very aware that the use case is definitely not
the
> >> best
> >>>>> for CouchDB, but for now, we have to deal with it.*
> >>>>>
> >>>>> *Scenario:*
> >>>>>
> >>>>> We have a fairly large (~750Gb) CouchDB (1.2.0) database that is
> >> being
> >>>>> used for transactional logs (very write heavy) (bad idea/design,
I
> >> know,
> >>>>> but that's besides the point of this question - we're looking at
> >>>>> alternative designs). Once in a while, we delete some of the records
> >> in
> >>>>> large batches and we have scheduled auto compaction, checking every
2
> >>>>> hours.
> >>>>>
> >>>>> This is the compaction config:
> >>>>>
> >>>>> [image: Inline image 1]
> >>>>>
> >>>>> From what I can see, the DB is being hammered significantly every
12
> >>>> hours
> >>>>> and the compaction is taking (sometimes 24 hours (with a size of
> >> 100GB of
> >>>>> log data, sometimes much more (up to 500GB)).
> >>>>>
> >>>>> We run on EC2. Large instances with EBS. No striping (yet), no IOPS.
> >> We
> >>>>> tried fatter machines, but the improvement was really minimal.
> >>>>>
> >>>>> **
> >>>>>
> >>>>> *The problem:*
> >>>>>
> >>>>> The problem is that compaction takes a very long time (e.g. 12h+)
and
> >>>>> reduces the performance of the entire stack. The main issue seems
to
> >> be
> >>>>> that it's hard for the compaction process to "keep up" with the
> >>>>
> >>>> insertions,
> >>>>> hence why it takes so long. Also, the compaction of the view takes
> >> long
> >>>>> time (sometimes the view is 100GB). During the re-compaction of
the
> >> view,
> >>>>> clients don't get a response, which is blocking the processes.
> >>>>>
> >>>>> [image: Inline image 2]
> >>>>>
> >>>>> The view compaction takes approx. 8 hours and the indexing for the
> >> view
> >>>>> are therefore slower and during the time that view indexes, another
> >> 300k
> >>>>> insertions have been done (and it doesn't catch up). The only way
to
> >>>>
> >>>> solve
> >>>>> the problem was to throttle the number of inserts from the app
> >> itself and
> >>>>> then eventually the view compaction resolved. If we would have
> >> continued
> >>>>
> >>>> to
> >>>>> insert at the same rate, it would not have finished (and ultimately,
> >> we
> >>>>> would have run out of disk space).
> >>>>>
> >>>>> Any recommendations to set it up on EC2 is welcome. Also
> >> configuration
> >>>>> settings for the compaction would be helpful.
> >>>>>
> >>>>> Thanks.
> >>>>>
> >>>>> Nicolas
> >>>>>
> >>>>> PS: We are happily using CouchDB for other (more traditional) use
> >> case
> >>>>> where it does go very well.
> >>
> >>
> >>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message