From commits-return-33280-archive-asf-public=cust-asf.ponee.io@couchdb.apache.org Wed Jun 6 04:28:00 2018 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx-eu-01.ponee.io (Postfix) with SMTP id D2FC318067B for ; Wed, 6 Jun 2018 04:27:59 +0200 (CEST) Received: (qmail 23092 invoked by uid 500); 6 Jun 2018 02:27:58 -0000 Mailing-List: contact commits-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@couchdb.apache.org Delivered-To: mailing list commits@couchdb.apache.org Received: (qmail 23064 invoked by uid 99); 6 Jun 2018 02:27:58 -0000 Received: from ec2-52-202-80-70.compute-1.amazonaws.com (HELO gitbox.apache.org) (52.202.80.70) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 06 Jun 2018 02:27:58 +0000 Received: by gitbox.apache.org (ASF Mail Server at gitbox.apache.org, from userid 33) id 9D4E482A03; Wed, 6 Jun 2018 02:27:57 +0000 (UTC) Date: Wed, 06 Jun 2018 02:27:59 +0000 To: "commits@couchdb.apache.org" Subject: [couchdb-documentation] 02/02: Document the _approx_count_distinct builtin MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit From: kocolosk@apache.org In-Reply-To: <152825207725.20651.5478437361373741067@gitbox.apache.org> References: <152825207725.20651.5478437361373741067@gitbox.apache.org> X-Git-Host: gitbox.apache.org X-Git-Repo: couchdb-documentation X-Git-Refname: refs/heads/2971-stats-documentation X-Git-Reftype: branch X-Git-Rev: 6ce359681744384b3da9b59d1d60c279ce243ac4 X-Git-NotificationType: diff X-Git-Multimail-Version: 1.5.dev Auto-Submitted: auto-generated Message-Id: <20180606022757.9D4E482A03@gitbox.apache.org> This is an automated email from the ASF dual-hosted git repository. kocolosk pushed a commit to branch 2971-stats-documentation in repository https://gitbox.apache.org/repos/asf/couchdb-documentation.git commit 6ce359681744384b3da9b59d1d60c279ce243ac4 Author: Adam Kocoloski AuthorDate: Mon May 28 14:28:51 2018 -0400 Document the _approx_count_distinct builtin --- src/ddocs/ddocs.rst | 26 ++++++++++++++++++++++++++ 1 file changed, 26 insertions(+) diff --git a/src/ddocs/ddocs.rst b/src/ddocs/ddocs.rst index 306023d..1991927 100644 --- a/src/ddocs/ddocs.rst +++ b/src/ddocs/ddocs.rst @@ -120,6 +120,32 @@ Additionally, CouchDB has a set of built-in reduce functions. These are implemented in Erlang and run inside CouchDB, so they are much faster than the equivalent JavaScript functions. +.. data:: _approx_count_distinct + +.. versionadded:: 2.2 + +Aproximates the number of distinct keys in a view index using a variant of the +`HyperLogLog`_ algorithm. This algorithm enables an efficient, parallelizable +computation of cardinality using fixed memory resources. CouchDB has configured +the underlying data structure to have a relative error of ~2%. + +.. _HyperLogLog: https://en.wikipedia.org/wiki/HyperLogLog + +As this reducer ignores the emitted values entirely, an invocation with +``group=true`` will simply return a value of 1 for every distinct key in the +view. In the case of array keys, querying the view with a ``group_level`` +specified will return the number of distinct keys that share the common group +prefix in each row. The algorithm is also cognizant of the ``startkey`` and +``endkey`` boundaries and will return the number of distinct keys within the +specified key range. + +A final note regarding Unicode collation: this reduce function uses the binary +representation of each key in the index directly as input to the HyperLogLog +filter. As such, it will (incorrectly) consider keys that are not byte identical +but that compare equal according to the Unicode collation rules to be distinct +keys, and thus has the potential to overestimate the cardinality of the key +space if a large number of such keys exist. + .. data:: _count Counts the number of values in the index with a given key. This could be -- To stop receiving notification emails like this one, please contact kocolosk@apache.org.