incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From FRANCESCO FURIANI <fra.furi...@stud.uniroma3.it>
Subject R: Problems with CouchDB 1.2.0 views on large documents JSONs
Date Mon, 11 Jun 2012 10:19:47 GMT
Hi Dave,

thx for all the hints.

We tried on an old linux machine with CouchDB 1.0.1 (Erlang R14B02/5.8.3) and the views are
working with more than 3 JSON (we're trying to import more, to see what's the limit). Seems
like a CouchDB 1.2.0 issue.

I'll do futher testing.


Regards,
Francesco
________________________________________
Da: Dave Cottlehuber [dave@muse.net.nz]
Inviato: lunedì 4 giugno 2012 19.48
A: user@couchdb.apache.org
Oggetto: Re: Problems with CouchDB 1.2.0 views on large documents JSONs

On 4 June 2012 21:03, Francesco Furiani <fra.furiani@stud.uniroma3.it> wrote:
> Hi,
>
> i run a couchdb server (v1.2.0) over a mac (intel architecture, 8gb of ram,
> os x version 10.6.8) installed with brew.
>
> The server itself is used as a storage of big jsons (example:
> https://raw.github.com/cvdlab-bio/webpdb/develop/docs/jsons/2LGB-pretty-print.json
> ) for a tiny uni project.
>
> When we load more than 3 of these jsons, all the map functions (we created
> to retrieve documents besides a simple get by id) does not work.
> A typical map is:
>
> function(doc){if(doc.TITLE.title.match('.*INSULIN.*') !== null) emit(doc.ID,
> doc);}
>
> but even a
>
> function(doc){emit(doc.ID, doc.ID)}
>
> cease to work.
>
> while when there are just 3 or 2 jsons in the database they work just fine.
> I tried increasing the stack for couchjs (1gb now, going over 1gb doesn't
> work it seems), increasing limits for files (4096), increasing timeout for
> processes but in the end i don't get any results and only a (Error:
> os_process_error {exit_status,0}) from the db.
>
> Is the json we provide too big for couch? We need to redisign map to remove
> parts for json? Is this a known bug (but i haven't found anything over the
> net)?
>
> Any clue that might help me?
>
> Thanks for the help,
> Francesco
>

Hi Francesco,

CouchDB stores JSON in a native erlang format on disk. Retrieving this
(whether to process in a JS map/reduce view, or to send through to an
http client) requires transforming this into JSON text format. For big
docs, this can take a while, or even when piped into couchjs, break.
There's a couple of other people who have reported this type of issue
recently on the ML.

You could avoid this by using erlang views**, or you may check whether
you see the same issue in 1.1.1 which has a different (slower) JSON
parsing tool.

Could you open a JIRA ticket for this issue please, seeing as you have
a nice sample doc to share?

Some general points:
typically you can replace emit(doc.id, doc) with emit(null) in your view.
You can always use ?include_docs=true to return the full data files in
your query.
The id of any doc emitted is available "for free" so you don't need
the duplication.
This will make your view significantly smaller by orders of magnitude.

** erlang views run inside the erlang vm, without a trusted sandbox rm
-rf and worse are all possible. But its likely faster, less
limitations per above issue, and comes with less documentation too.
YMMV, don't forget to wear a seatbelt, and never _ever_ run with
scissors.

A+
Dave



Mime
View raw message