couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tony Sun <tony.sun...@gmail.com>
Subject Re: Disable "index all" default capability with mango text indexes
Date Tue, 06 Dec 2016 08:51:13 GMT
Thanks for the suggestions!

I've opened a PR here: https://github.com/apache/couchdb-mango/pull/33

It basically adds in a new config parameter to allow us to disable new
requests or add warnings in the log. Essentially it won't break existing
functionality until things get bad. So this allows the admin to notify
users before the change is made.

Let me know what you guys think!

Tony



On Sun, Dec 4, 2016 at 11:33 PM, Glynn Bird <glynn.bird@gmail.com> wrote:

> This feature is a great way for beginners to experiment with Mango but I
> think it would be fine to limit this "index all" operation to top level
> strings/numbers/booleans/arrays - this is the approach that DynamoDB takes
> to limit the complexity it has to deal with (together with a 400kB document
> size limit & a 1MB result set cap).
>
> On Sat, 3 Dec 2016 at 20:35 Joan Touzet <wohali@apache.org> wrote:
>
> What other possibilities are there?
>
> We could set a max recursion depth, perhaps.
>
> couchjs has a limit on max stack size; a recursion depth limit would be
> similar in spirit.
>
> If nothing else is possible I'm happy for this change as a last-ditch
> effort.
>
> -Joan
>
> ----- Original Message -----
> > From: "Tony Sun" <tony.sun427@gmail.com>
> > To: dev@couchdb.apache.org
> > Sent: Saturday, December 3, 2016 3:19:34 PM
> > Subject: Disable "index all" default capability with mango text indexes
> >
> > Hi all,
> >
> > In mango, once a user has correctly setup text search, he or she can
> > create
> > a simple text index with:
> >
> > {
> > "type" : "text"
> > "index" :{}
> > }
> >
> > This default basically indexes every field in a document for the
> > entire db.
> > Unfortunately, for large dbs, this is resource intensive. Even with
> > warnings, users tend to favor this default behavior because it allows
> > them
> > to quickly being querying the db.
> >
> > In rare instances, if users have nested complex arrays, then
> > thousands of
> > unique field names are generate and could lead to JVM heap
> > exhaustion.
> > We've seen in production that this scenario can disable a cluster.
> >
> > I've entertained the idea of disabling this default behavior. The
> > biggest
> > concern is of course that existing application apis which depend on
> > this
> > default behavior will be affected. I'll think about various solutions
> > to
> > mitigate the impact, but I wanted to throw this out there to see if
> > people
> > are in agreement that we should do this.
> >
> > Thanks,
> >
> > Tony
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message