incubator-couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Noah Slater <nsla...@apache.org>
Subject Re: [VOTE] Merge BigCouch
Date Wed, 15 May 2013 15:24:23 GMT
I can help! :)


On 15 May 2013 16:23, Robert Newson <rnewson@apache.org> wrote:

> :)
>
> Jan, I think you said you'd help start the IP clearance bit?
>
> On 15 May 2013 15:03, Noah Slater <nslater@apache.org> wrote:
> > PARTY TIME 🎉
> >
> >
> > On 15 May 2013 10:40, Robert Newson <rnewson@apache.org> wrote:
> >
> >> Thanks everyone.
> >>
> >> The tally is;
> >>
> >> 13 +1's
> >>
> >> The vote passes. We'll now move on to IP clearance. Once that's done
> >> the work will arrive on a feature branch in our main git repository.
> >>
> >> B.
> >>
> >>
> >> On 13 May 2013 04:31, Jason Smith <jhs@iriscouch.com> wrote:
> >> > Sorry, just catching up.
> >> >
> >> > +1
> >> >
> >> > On Fri, May 10, 2013 at 4:29 PM, Jan Lehnardt <jan@apache.org> wrote:
> >> >> +1
> >> >>
> >> >> Jan
> >> >> --
> >> >>
> >> >> On May 7, 2013, at 21:34 , Robert Newson <rnewson@apache.org>
wrote:
> >> >>
> >> >>> Hi All,
> >> >>>
> >> >>> I propose to merge in the following work,
> >> >>> https://github.com/rnewson/couchdb/tree/nebraska-merge-candidate
to
> >> >>> the official Apache CouchDB repository to a new branch (i.e, *not*
> >> >>> master). Once there, the full CouchDB developer community can begin
> >> >>> the work to incorporate the code here into an official release.
> >> >>>
> >> >>> You do not need to respond if you are in agreement. If there is
no
> >> >>> response in 72 hours, I will assume lazy consensus. If we reach
> >> >>> consensus, I will start the IP clearance process and then the merge.
> >> >>>
> >> >>> As most of you know, Paul Davis and I recently sequestered ourselves
> >> >>> away from society (in a place called Nebraska) to make this merge
> >> >>> happen. I want to clarify that this work is not the BigCouch code
> you
> >> >>> can see on github.com/cloudant/bigcouch but the Cloudant platform
> from
> >> >>> which BigCouch was made. This means it is bang up to date with
all
> the
> >> >>> bug fixes and feature enhancements we've made in the last eighteen
> >> >>> months or more. With that clarification made, here are our notes
> about
> >> >>> what we achieved, what it means to the project and what isn't yet
> >> >>> done;
> >> >>>
> >> >>> Nebraska Merge Roundup
> >> >>>
> >> >>>
> >> >>> Stats:
> >> >>>
> >> >>>
> >> >>> 1402 - total new commits
> >> >>>
> >> >>> 312 - commits written during the merge (will be reduced
> substantially
> >> >>> by squashing)
> >> >>>
> >> >>> 408 - number of files changed
> >> >>>
> >> >>> 21,897 - number of lines added
> >> >>>
> >> >>> 4,277 - number of lines removed
> >> >>>
> >> >>> A retrospective:
> >> >>>
> >> >>> Bob Newson and I have come to the end of our merge sprint on getting
> >> >>> BigCouch merged into Apache CouchDB. Its been a productive ten
days
> >> >>> here in the midwest. I managed to get Bob out to a bowling alley
and
> >> >>> he managed to get me to a sushi restaurant. In between the cultural
> >> >>> exchanges we’ve also managed to get a significant amount of work
> done
> >> >>> on the merging as well.
> >> >>>
> >> >>>
> >> >>> The current status of the merge is that we’ve managed to resolve
the
> >> >>> differences in the single node execution of CouchDB. Both the
> >> >>> JavaScript and Erlang test suites run with only one failure in
the
> >> >>> Erlang test suite due to a (deliberately) missing constraint on
the
> >> >>> number of operating system processes. This should be a relatively
> >> >>> straightforward fix but was not prioritized during our limited
time
> to
> >> >>> work on the larger issues.
> >> >>>
> >> >>>
> >> >>> We merged a large number of performance and stability enhancements
> >> >>> back into single node CouchDB as well as a number of pure bug fixes.
> >> >>> The biggest highlight is a brand new compactor that is both faster
> and
> >> >>> creates smaller and better organized post-compaction databases.
> >> >>>
> >> >>>
> >> >>> The current status of the merge is that single node operations
> should
> >> >>> be completely unaffected as demonstrated by the test suite passing.
> On
> >> >>> the other hand we haven’t yet finished getting the clustered
code
> >> >>> merged to use some of the new changes in single node CouchDB. The
> >> >>> single most significant portion of this work involves updates to
the
> >> >>> internal cluster API for views to use the recently rewritten indexer
> >> >>> APIs. This should be a relatively straightforward bit of work that
> >> >>> we’ll be finishing over the next few weeks.
> >> >>>
> >> >>>
> >> >>> All in all the merge work done so far has been quite successful.
> We’ve
> >> >>> met our primary goal of getting the code merged in a fashion that
> does
> >> >>> not affect single node operation while providing a starting point
> for
> >> >>> the larger community to start reviewing the more significant changes
> >> >>> made. Given the size of the diff between the two code bases we
never
> >> >>> expected to have a fully working clustered solution after ten days
> of
> >> >>> work but we have succeeded in providing a base of work that will
> allow
> >> >>> us and new contributors to get up to speed quickly.
> >> >>>
> >> >>>
> >> >>> This work, coupled with work by Dave Cottlehuber and Benoît Chesneau
> >> >>> on updating the build system and various other internal updates,
> will
> >> >>> provide a solid foundation for work going forward. Its an exciting
> >> >>> time for CouchDB and anyone interested should keep an eye on the
> next
> >> >>> few releases as we ramp up work on various core aspects of the
> >> >>> database.
> >> >>>
> >> >>>
> >> >>> We’ve had an exciting few days working to prepare the road for
an
> >> >>> exciting next twelve to eighteen months. We hope that everyone
will
> >> >>> feel as excited as we do about the next twelve to eighteen months
> for
> >> >>> Apache CouchDB. It should be an exciting ride.
> >> >>>
> >> >>>
> >> >>>
> >> >>> Things we got done
> >> >>>
> >> >>>
> >> >>> * Large update to the source tree layout for Erlang applications.
> Each
> >> >>> application now has a src/appname/(c_src|ebin|priv|src) structure.
> The
> >> >>> build system has been updated.
> >> >>>
> >> >>> * Renamed src/couchdb to src/couch to match the Erlang convention
of
> >> >>> the top directory name matching the Erlang application name.
> >> >>>
> >> >>> * Imported Cloudant Erlang applications for clustered CouchDB.
These
> >> >>> are imported with their history by using git subtree and merging
the
> >> >>> top level commit. These are not external deps, development will
> happen
> >> >>> within the CouchDB tree. The imported apps are:
> >> >>>
> >> >>>
> >> >>>   * config - A couch_config replacement (Behavior is mostly
> identical
> >> >>> to couch_config except how we listen for configuration changes
> >> >>> internally to allow for smooth hot code upgrade).
> >> >>>
> >> >>>   * twig - An rsyslog source replacement for couch_log.
> >> >>>
> >> >>>   * rexi - An RPC library. Replaces Erlang’s built-in rex
> application
> >> >>> to avoid costly safety measures in the interest of performance
and
> >> >>> throughput.
> >> >>>
> >> >>>   * mem3 - The “Dynamo” part of BigCouch responsible for managing
> >> cluster state
> >> >>>
> >> >>>   * fabric - The internal cluster-aware CouachDB API
> >> >>>
> >> >>>   * ets_lru - A small library application that provides an LRU
> >> >>> implementation using a couple ets tables.
> >> >>>
> >> >>>   * ddoc_cache - Caches design documents on each node for use in
> >> >>> design handler functions. This uses an ets_lru cache with a very
> short
> >> >>> TTL.
> >> >>>
> >> >>>   * chttpd - The cluster aware HTTP layer
> >> >>>
> >> >>>
> >> >>> Each imported app also had its build system updated to use Autotools
> >> >>> along with the necessary updates noted above for the new application
> >> >>> layouts for existing CouchDB erlang apps.
> >> >>>
> >> >>>
> >> >>> * Merged a large amount of updates and fixes to couch_replicator
> based
> >> >>> on work done internally at Cloudant. Unfortunately due to an error
> >> >>> when we created our internal clone we lost a bit of history in
some
> of
> >> >>> the initial merge and have a big commit that affects
> >> >>> couch_replicator_manager mostly. There are a number of other commits
> >> >>> related to couch_replicator that resolve the single node vs.
> clustered
> >> >>> differences. Some noticeable couch_replicator features:
> >> >>>
> >> >>>
> >> >>>   * Optionally disable checkpoints so that replication can work
when
> >> >>> a source is read only. This should only be used for smaller
> databases
> >> >>> as each replication call has to scan the entire source database
on
> >> >>> each invocation.
> >> >>>
> >> >>>   * A new changes_pending field in the _active_tasks output
> >> >>>
> >> >>>   * A fix to the continuous replication to automatically reconnect
> to
> >> >>> a continuous changes feed when it sees a last_seq value. This allows
> >> >>> for the source to selectively recycle the HTTP connections used
> which
> >> >>> can be quite useful for “permanent” replications.
> >> >>>
> >> >>>   * A multitude of smaller bug fix and stability enhancements.
> >> >>>
> >> >>>
> >> >>> Updates to single node couch:
> >> >>>
> >> >>>
> >> >>> * We changed the by_seq tree to store a copy of the #full_doc_info{}
> >> >>> record instead of the #doc_info{} record. This gives significant
> speed
> >> >>> improvements for compaction and replication and generally anything
> >> >>> that needs to walk the by_seq tree and access document bodies
> >> >>> internally.
> >> >>>
> >> >>> * We rewrote the compactor to be significantly faster as well as
> >> >>> provides significantly better compacted databases. The two main
> halves
> >> >>> are to use a temp file and replace the use of btrees in the temp
> file.
> >> >>> The temp file only contains a temporary copy of the document ids.
At
> >> >>> the end of a compaction run we then rebuild the by_id btree in
the
> >> >>> compaction file from this temp file. The reason this helps so much
> is
> >> >>> that the compaction is based on the update_seq btree, which for
most
> >> >>> cases means that the id tree is updated in roughly random order
> which
> >> >>> is very bad for our append only btrees. By using the tmp file we
can
> >> >>> stream it in order back into the compacted db file at the end of
> >> >>> compacting, generating a minimum amount of garbage in the process.
> The
> >> >>> other upgrade was to implement an external merge sort module
> >> >>> (couch_emsort) that is used with this temporary file.
> >> >>>
> >> >>> * Reject updates to design docs that introduce updates that break
> >> >>> compilation for source code. Currently we only check map and reduce
> >> >>> calls as the other should provide user visible errors instead of
> >> >>> inexplicably empty views.
> >> >>>
> >> >>> because my OCD kicked in and I was unable to resist.
> >> >>>
> >> >>> * Reverted a change made a long time ago that uses two file
> >> >>> descriptors for each database. See the todo list.
> >> >>>
> >> >>> * The reason to remove the second fd is so that we can rewrite
ref
> >> >>> counting. Better ref counting makes everyone happy, but the real
> >> >>> reason is for this next bullet point:
> >> >>>
> >> >>> * Optimize couch_server to not require a round trip message pass
for
> >> >>> opening a database that’s in the LRU. This is a significant
> >> >>> performance boost for high concurrency access. We also optimized
> >> >>> couch_server internals to not blow up when it’s under load.
> >> >>>
> >> >>> * Introduce a #leaf{} record into the revision trees. This is never
> >> >>> written to disk but makes internal code a lot cleaner when dealing
> >> >>> with multiple versions of rev tree values.
> >> >>>
> >> >>> * Some changes to couch_changes to enable clustered access. Also
> some
> >> >>> general cleanup
> >> >>>
> >> >>> * Internal changes to how CouchDB is booted in Erlang land. Not
very
> >> >>> sexy but this removes a lot of complicated un-Erlangy bits. We
still
> >> >>> have a bit of work left here.
> >> >>>
> >> >>> * btree chunk sizes are now configurable which can allow people
to
> >> >>> adjust the RAM/speed tradeoffs a bit more.
> >> >>>
> >> >>> * We now load update validation functions on the first write. This
> is
> >> >>> a cluster-motivated change because the clustered version of this
> call
> >> >>> is expensive and can lead to race conditions when opening a bunch
of
> >> >>> db shards simultaneously. This should be invisible to external
> >> >>> clients.
> >> >>>
> >> >>> * Disabled conflict detection for local docs. They don’t replicate
> so
> >> >>> there’s no point. This just led to clusters getting stuck and
> confused
> >> >>> when there were lots of replications happening.
> >> >>>
> >> >>> * Changes to the multipart/mime parsing code. Necessary for
> clustered
> >> >>> attachment uploads to split the incoming data  stream into N copies.
> >> >>>
> >> >>> * Don’t use init:restart/0 when reloading the ICU driver. I think
> >> >>> this has a bug. But we should rewrite this driver to be a NIF
> anyway.
> >> >>>
> >> >>> * New couch OS process manager. Significantly faster access to
OS
> >> >>> processes under heavy load. This replaces the hard limit with a
soft
> >> >>> limit. Process spawned over the soft limit will be used until
> they’ve
> >> >>> sat idle for a few minutes and then be closed. We have a todo item
> to
> >> >>> add the hard ceiling back in (while keeping the soft ceiling).
> >> >>>
> >> >>> * Automatically replace some easily identifiable JS reductions
with
> >> >>> their builtin counterparts. Uses a regex to do the detection so
its
> >> >>> not too smart.
> >> >>>
> >> >>> * Improved view updater write batch.
> >> >>>
> >> >>> * Updates to couchjs’ views.js to improve index update speeds
> >> >>>
> >> >>> * Updates to the _stats bultin reduce to allow reduces to work
over
> >> >>> emitted stats objects. Sometimes clients have summary data in a
doc,
> >> >>> and this allows them to combine stats if they follow the same
> pattern
> >> >>> as the builtin expects.
> >> >>>
> >> >>> * Added a config:reload() that is accessible by POST’ing to
> >> >>> _config/_reload. Used by the JS tests to reset the config to what's
> on
> >> >>> disk. This should prevent those test run failures where a test
fails
> >> >>> leaving the config in a bad state causing all subsequent tests
to
> >> >>> fail. I think. Maybe.
> >> >>>
> >> >>> * Databases are deleted synchronously in the test suite. We may
need
> >> >>> to address this on Windows. But it does seem to reduce the number
of
> >> >>> “{error, file_exists}” failures.
> >> >>>
> >> >>> * I reimplemented the JS restartServer() function. There’s a
new
> >> >>> _restart/token URL that will given a unique value for each instance
> of
> >> >>> the Erlang VM. To run a restart we grab the current token value,
hit
> >> >>> _restart, then wait till we get a successful response with a
> different
> >> >>> token. This appears to have made the restart strategy more robust.
> >> >>>
> >> >>>
> >> >>>
> >> >>> Things that need doing
> >> >>>
> >> >>>
> >> >>> IP Clearance -
> >> >>>
> >> >>>
> >> >>> We’ll need to track down if we have the CCLA as well as look
at each
> >> >>> source file added to make sure each one is strictly from Cloudant
or
> >> >>> has an amenable license. I’m pretty sure that the only one of
> interest
> >> >>> is trunc_io.erl but we need to be thorough.
> >> >>>
> >> >>> documentation -
> >> >>>
> >> >>>
> >> >>> There shouldn’t be much here since the entire point of this merge
> was
> >> >>> to not change the visible behavior of single node couch. A few
> things
> >> >>> to add about the testing endpoints. Maybe an update to the
> compaction
> >> >>> section mention the two new file names used.
> >> >>>
> >> >>>
> >> >>> Copyright notices -
> >> >>>
> >> >>>
> >> >>> We need to strip out copyright notices from individual files and
> make
> >> >>> sure all files have a standard Apache License v2 header.
> >> >>>
> >> >>>
> >> >>> clustered vhosts -
> >> >>>
> >> >>>
> >> >>> We’ve never implemented this at Cloudant. We either need to write
a
> >> >>> cluster or go back and tell people to use HAProxy (or similar)
for
> >> >>> such things.
> >> >>>
> >> >>>
> >> >>> twig -
> >> >>>
> >> >>>
> >> >>> We need to add another output type to twig that is configurable
in
> >> >>> some manner. Right now we spit out entire rsyslog records which
> isn’t
> >> >>> useful for most people. We’ll need to implement the file writer
from
> >> >>> couch_log as well as update the _log HTTP handler to know when
it
> can
> >> >>> and can’t expect to find data on disk.
> >> >>>
> >> >>>
> >> >>> fabric -
> >> >>>
> >> >>>
> >> >>> This is going to need a lot of work. Specifically view access is
> going
> >> >>> to need to be updated to work with couch_mrview and friends.
> >> >>>
> >> >>>
> >> >>> Boot a dev cluster -
> >> >>>
> >> >>>
> >> >>> Once we fix up the clustering code we’ll need to write instructions
> >> >>> and scripts for pulling up a dev cluster.
> >> >>>
> >> >>>
> >> >>> OTP stuff -
> >> >>>
> >> >>>
> >> >>> We’ve updated each app but we still need to pull some parts out
of
> >> >>> couchdb into their own application. Specifically the HTTP layer
> needs
> >> >>> its own app. We could probably pull out the os process/query_servers
> >> >>> as well as the os daemons and friends. Once done we need to update
> the
> >> >>> supervision trees so we don’t have things like couch starting
and
> >> >>> managing the replication manager process.
> >> >>>
> >> >>>
> >> >>> ddoc_cache -
> >> >>>
> >> >>>
> >> >>> Wire this up in couch_httpd_db to actually be used. Right now its
> only
> >> >>> used in chttpd.
> >> >>>
> >> >>>
> >> >>> couch_file upgrade -
> >> >>>
> >> >>>
> >> >>> The revert to remove the second updater_fd from each #db{} record
> >> >>> means that we’re back in the original position of files appearing
to
> >> >>> slow down significantly under load. Since the initial hammer
> approach
> >> >>> of just adding a second fd we’ve since discovered that the
> underlying
> >> >>> bug is due to the way that message passing works combined with
> >> >>> Erlang’s file io. Significantly though is the fact that the fix
is
> >> >>> rather simple to implement. A first draft of this work is on an
old
> >> >>> branch of mine here:
> >> >>>
> >> >>>
> >> >>>   https://github.com/davisp/couchdb/commit/d856878
> >> >>>
> >> >>>
> >> >>> finish the size calculating changes -
> >> >>>
> >> >>>
> >> >>> The #leaf{} record change is to enable us to add more data size
> >> >>> calculations. CouchDB master calculates a data size that account
for
> >> >>> all bytes that are active in a .couch file. Cloudant is interested
> in
> >> >>> the total size of uncompressed docs and attachments minus the
> internal
> >> >>> overhead of btrees. And there’s a fourth number to calculate
based
> on
> >> >>> the compression level used. Having each of these numbers will be
> >> >>> useful as well as the calculations they’ll enable (ie, dead bytes
in
> >> >>> file, bytes used for overhead, compression ratio achieved, etc).
> >> >>>
> >> >>>
> >> >>> couch_proc_manager -
> >> >>>
> >> >>>
> >> >>> We need to implement the hard ceiling for capping the number of
OS
> >> >>> processes. We’ve started seeing a need for this at Cloudant with
> some
> >> >>> work loads so motivation to fix this is high. The only failing
etap
> is
> >> >>> the assertion of this ceiling.
> >> >>>
> >> >>>
> >> >>> Synchronous db delete on Windows -
> >> >>>
> >> >>>
> >> >>> I did this because running the test suite was driving me bonkers.
I
> >> >>> need to ask Dave about how this behaves on Windows (my guess is
not
> >> >>> well) but I think we can close things up so that it works better
> than
> >> >>> the status quo.
> >> >>
> >> >
> >> >
> >> >
> >> > --
> >> > Iris Couch
> >>
> >
> >
> >
> > --
> > NS
>



-- 
NS

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message