incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dave Cottlehuber <d...@muse.net.nz>
Subject Re: Problems with CouchDB 1.2.0 views on large documents JSONs
Date Mon, 11 Jun 2012 10:25:00 GMT
Hi Francesco, yes  I think so too, let's keep that ticket up to date with notes.

On 11 June 2012 12:19, FRANCESCO FURIANI <fra.furiani@stud.uniroma3.it> wrote:
> Hi Dave,
>
> thx for all the hints.
>
> We tried on an old linux machine with CouchDB 1.0.1 (Erlang R14B02/5.8.3) and the views
are working with more than 3 JSON (we're trying to import more, to see what's the limit).
Seems like a CouchDB 1.2.0 issue.
>
> I'll do futher testing.
>
>
> Regards,
> Francesco
> ________________________________________
> Da: Dave Cottlehuber [dave@muse.net.nz]
> Inviato: lunedì 4 giugno 2012 19.48
> A: user@couchdb.apache.org
> Oggetto: Re: Problems with CouchDB 1.2.0 views on large documents JSONs
>
> On 4 June 2012 21:03, Francesco Furiani <fra.furiani@stud.uniroma3.it> wrote:
>> Hi,
>>
>> i run a couchdb server (v1.2.0) over a mac (intel architecture, 8gb of ram,
>> os x version 10.6.8) installed with brew.
>>
>> The server itself is used as a storage of big jsons (example:
>> https://raw.github.com/cvdlab-bio/webpdb/develop/docs/jsons/2LGB-pretty-print.json
>> ) for a tiny uni project.
>>
>> When we load more than 3 of these jsons, all the map functions (we created
>> to retrieve documents besides a simple get by id) does not work.
>> A typical map is:
>>
>> function(doc){if(doc.TITLE.title.match('.*INSULIN.*') !== null) emit(doc.ID,
>> doc);}
>>
>> but even a
>>
>> function(doc){emit(doc.ID, doc.ID)}
>>
>> cease to work.
>>
>> while when there are just 3 or 2 jsons in the database they work just fine.
>> I tried increasing the stack for couchjs (1gb now, going over 1gb doesn't
>> work it seems), increasing limits for files (4096), increasing timeout for
>> processes but in the end i don't get any results and only a (Error:
>> os_process_error {exit_status,0}) from the db.
>>
>> Is the json we provide too big for couch? We need to redisign map to remove
>> parts for json? Is this a known bug (but i haven't found anything over the
>> net)?
>>
>> Any clue that might help me?
>>
>> Thanks for the help,
>> Francesco
>>
>
> Hi Francesco,
>
> CouchDB stores JSON in a native erlang format on disk. Retrieving this
> (whether to process in a JS map/reduce view, or to send through to an
> http client) requires transforming this into JSON text format. For big
> docs, this can take a while, or even when piped into couchjs, break.
> There's a couple of other people who have reported this type of issue
> recently on the ML.
>
> You could avoid this by using erlang views**, or you may check whether
> you see the same issue in 1.1.1 which has a different (slower) JSON
> parsing tool.
>
> Could you open a JIRA ticket for this issue please, seeing as you have
> a nice sample doc to share?
>
> Some general points:
> typically you can replace emit(doc.id, doc) with emit(null) in your view.
> You can always use ?include_docs=true to return the full data files in
> your query.
> The id of any doc emitted is available "for free" so you don't need
> the duplication.
> This will make your view significantly smaller by orders of magnitude.
>
> ** erlang views run inside the erlang vm, without a trusted sandbox rm
> -rf and worse are all possible. But its likely faster, less
> limitations per above issue, and comes with less documentation too.
> YMMV, don't forget to wear a seatbelt, and never _ever_ run with
> scissors.
>
> A+
> Dave
>
>

Mime
View raw message