couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alexander Shorin <kxe...@gmail.com>
Subject Re: [DISCUSSION] Limiting the allowed size for documents
Date Fri, 08 Jul 2016 02:18:33 GMT
On Fri, Jul 8, 2016 at 12:44 AM, Robert Kowalski <rok@kowalski.gd> wrote:
> Couch 1.x and Couch 2.x will choke as soon as the indexer tries to
> process a too large document that was added. The indexing stops and
> you have to manually remove the doc. In the best case you built an
> automatic process around the process. The automatic process removes
> the document instead of the human.

Automatic process of removing stored data in production? You might be kidding (:

Limiting the document size here sound like a wrong way to some the
indexer issue when it cannot handle such documents. Two solutions
comes on mind:

- Indexer ignores big documents generating enough of loud to help user
notice the problem;
- Indexer is fixed to handle big documents;

>From user side the second option is the only right because it's my
data, I put it to database, I trust database in the way it can process
it, it shouldn't fail me.

What should user do when he hit the limit and cannot store the
document, because indexer is buggy, but he need this data to be
processed? He becomes very annoying. Because he need that data as is
and any attempts to split it into multiple documents may be impossible
(because we don't have cross documents links and transactions). What's
the next step for him? Change a database for sure.

I think that the indexer argument is quite weak and strange. More
strong one is about to cut off possibility of uploading bloat data
when by design there are some sane boundaries for the stored data. If
all your documents are avg. 1MiB and your database receives data from
the world, you would like to explicitly drop anomalies of dozens and
hundreds MiB because that's not a data you're working with.

See also: https://github.com/apache/couchdb-chttpd/pull/114 - Tony Sun
made some attempts to add such limit to CouchDB.

There are couple of problems to actually implement such limit in
predictable and lightweight way because we have awesome _update
functions (; But I believe that all of them could be overcome.

--
,,,^..^,,,

Mime
View raw message