incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Daniel Gonzalez <gonva...@gonvaled.com>
Subject Re: general question about couch performance
Date Thu, 17 Jan 2013 23:29:14 GMT
The problem is not replication, the problem is the source of the data. The
replicators will just distribute the data that is being inserted to other
server instances.

You can not use that monotonical id generator if you are inserting data
from different servers or applications. But if you are, let's say,
importing data to a single couchdb (replication or not) from a third-party
database in one batch job, you have full control on the IDs, so you can use
that id generator. That will improve the performance of your database,
specially in relation to space used and view generation.

On Fri, Jan 18, 2013 at 12:20 AM, Mark Hahn <mark@hahnca.com> wrote:

> > you can only do this if you are in control of the IDs
>
> This wouldn't work with multiple servers replicating, would it?
>
>
> On Thu, Jan 17, 2013 at 3:15 PM, Daniel Gonzalez <gonvaled@gonvaled.com
> >wrote:
>
> > And here you have BaseConverter:
> >
> > """
> > Convert numbers from base 10 integers to base X strings and back again.
> >
> > Sample usage:
> >
> > >>> base20 = BaseConverter('0123456789abcdefghij')
> > >>> base20.from_decimal(1234)
> > '31e'
> > >>> base20.to_decimal('31e')
> > 1234
> > """
> >
> > class BaseConverter(object):
> >     decimal_digits = "0123456789"
> >
> >     def __init__(self, digits):
> >         self.digits = digits
> >
> >     def from_decimal(self, i):
> >         return self.convert(i, self.decimal_digits, self.digits)
> >
> >     def to_decimal(self, s):
> >         return int(self.convert(s, self.digits, self.decimal_digits))
> >
> >     def convert(number, fromdigits, todigits):
> >         # Based on http://code.activestate.com/recipes/111286/
> >         if str(number)[0] == '-':
> >             number = str(number)[1:]
> >             neg = 1
> >         else:
> >             neg = 0
> >
> >         # make an integer out of the number
> >         x = 0
> >         for digit in str(number):
> >            x = x * len(fromdigits) + fromdigits.index(digit)
> >
> >         # create the result in base 'len(todigits)'
> >         if x == 0:
> >             res = todigits[0]
> >         else:
> >             res = ""
> >             while x > 0:
> >                 digit = x % len(todigits)
> >                 res = todigits[digit] + res
> >                 x = int(x / len(todigits))
> >             if neg:
> >                 res = '-' + res
> >         return res
> >     convert = staticmethod(convert)
> >
> >
> > On Fri, Jan 18, 2013 at 12:13 AM, Daniel Gonzalez <gonvaled@gonvaled.com
> > >wrote:
> >
> > > Also, in order to improve view performance, it is better if you use a
> > > short and monotonically increasing id: this is what I am using for one
> of
> > > my databases with millions of documents:
> > >
> > > class MonotonicalID:
> > >
> > >     def __init__(self, cnt = 0):
> > >         self.cnt = cnt
> > >         self.base62 =
> > >
> >
> BaseConverter('ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789abcdefghijklmnopqrstuvwxyz')
> > >         # This alphabet is better for couchdb, since it represents the
> > > Unicode Collation Algorithm
> > >         self.base64_couch =
> > >
> >
> BaseConverter('-@0123456789aAbBcCdDeEfFgGhHiIjJkKlLmMnNoOpPqQrRsStTuUvVwWxXyYzZ')
> > >
> > >     def get(self):
> > >         res = self.base64_couch.from_decimal(self.cnt)
> > >         self.cnt += 1
> > >         return res
> > >
> > > Doing this will:
> > > - save space in the database, since the id starts small: take into
> > account
> > > that the id is used in lots of internal data structures in couchdb, so
> > > making it short will save lots of space in a big database
> > > - making it ordered (in the couchdb sense) will speed up certain
> > operations
> > >
> > > Drawback: you can only do this if you are in control of the IDs (you
> know
> > > that nobody else is going to be generating IDs)
> > >
> > > On Thu, Jan 17, 2013 at 8:00 PM, Mark Hahn <mark@hahnca.com> wrote:
> > >
> > >> Thanks for the tips.  Keep them coming.
> > >>
> > >> I'm going to try everything I can.  If I find anything surprising I'll
> > let
> > >> everyone know.
> > >>
> > >>
> > >> On Thu, Jan 17, 2013 at 4:54 AM, Daniel Gonzalez <
> gonvaled@gonvaled.com
> > >> >wrote:
> > >>
> > >> > Are you doing single writes or batch writes?
> > >> > I managed to improve the write performance by collecting the
> documents
> > >> and
> > >> > sending them in a single access.
> > >> > The same applies for read accesses.
> > >> >
> > >> > On Wed, Jan 16, 2013 at 9:17 PM, Mark Hahn <mark@hahnca.com>
wrote:
> > >> >
> > >> > > My couchdb is seeing a typical request rate of about 100/sec
when
> it
> > >> is
> > >> > > maxed out.  This is typically 10 reads/write.  This is
> > disappointing.
> > >>  I
> > >> > > was hoping to 3 to 5 ms per op, not 10 ms.  What performance
> numbers
> > >> are
> > >> > > others seeing?
> > >> > >
> > >> > > I have 35 views with only 50 to 100 entries per view.  My db
is
> less
> > >> > than a
> > >> > > gigabyte with a few thousand active docs.
> > >> > >
> > >> > > I'm running on a medium ec2 instance with ephemeral disk.  I
> assume
> > I
> > >> am
> > >> > IO
> > >> > > bound as the cpu is not maxing out.
> > >> > >
> > >> > > How much worse would this get if the db also had to handle
> > replication
> > >> > > between multiple servers?
> > >> > >
> > >> >
> > >>
> > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message