couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From CGS <cgsmcml...@gmail.com>
Subject Re: i have a bulk insert problem about invalid json
Date Tue, 10 Jan 2012 15:04:21 GMT

Hi Robert,

1. I admit it was a bad example because W3C defines no side effect for 
the HTTP POST method. Nevertheless, there are two cases when the W3C 
specifications do not apply for that:
a) ISP limit for HTTP POST method (I heard some are doing that even if I 
cannot point any);
b) cURL default behavior which can be suppressed by its options for 
including large POST data.

2. Well, in my case, I was transferring the data from Erlang list to 
construct the cURL command. Multiple lists of the same kind of data were 
held in the memory in the same time, so, I wouldn't say I hit the RAM 
limit (when I tested cURL command, I reduced the number of lists to be 
sure I don't hit the RAM limit). But I admit at that time I didn't think 
of (or I discarded because of my lack of knowledge) cURL limitation, but 
I had the impression it was related to the command line length (well, I 
never hit that limit by then and that was strange enough for me, but I 
had no time to dig too much in the problem - I think I was just happy I 
found a solution which worked).

And, yes, 255 kdocs can reach the RAM limit, as you said. In any case, I 
would recommend using chunks (defined by multipart or just dividing the 
documents for multiple cURL independent instances).

CGS




On 01/10/2012 01:23 PM, Robert Newson wrote:
> 1) That refers to the length of the URL (to _bulk_docs, in this case),
> not the body.
>
> 2) That refers to the length of the command line, not the lengths of
> files referenced on the command line.
>
> B.
>
> On 10 January 2012 11:58, CGS<cgsmcmlxxv@gmail.com>  wrote:
>> There is a limit for sure, but there are two factors you have to consider:
>> 1. HTTP request limit in the number of characters (for example, read this:
>> http://stackoverflow.com/questions/2659952/maximum-length-of-http-get-request);
>> 2. prompter command under Linux/Cygwin has a maximum number of characters
>> (depends on the Linux flavor).
>>
>> Under CentOS 6, I was able to send 800 documents per instance (document =
>> few simple pairs key-value including _id), but not 1000. At 1 kdocs I got
>> shell error. Nevertheless, this test is not complete because I used CentOS 6
>> for both client and CouchDB server and I don't know the exact length of the
>> command.
>>
>> CGS
>>
>>
>>
>>
>>
>> On 01/10/2012 12:20 PM, Zekeriya KOÇ wrote:
>>> Thanks for all replies.
>>>
>>> The problem was first, the BOM character. After that i split my files into
>>> chunks that smaller than 30mb. and it started to work.
>>>
>>> There is a request size limit isn't there?
>>>
>>> Again, thanks for all the replies.
>>>
>>> 2012/1/10 CGS<cgsmcmlxxv@gmail.com>
>>>
>>>> Oh, I forgot to write the solution, in case it's not obvious. Just divide
>>>> the number of docs for multiple instances of cURL and it will work. Don't
>>>> worry, you still use the power of the bulk operation (I had an insertion
>>>> rate like 5-6 kdocs/s on a not-that-greate server even if I had to send
>>>> more requests at the same time).
>>>>
>>>> CGS
>>>>
>>>>
>>>>
>>>>
>>>> On 01/10/2012 11:45 AM, CGS wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> With 255000 documents in one session, you go over the number of the
>>>>> characters allowed either for a prompter command or for a HTTP request
>>>>> (if
>>>>> not for both). The session truncates the command, so, your JSON is
>>>>> incomplete. That gave me that response in the past.
>>>>>
>>>>> CGS
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On 01/10/2012 10:11 AM, Zekeriya KOÇ wrote:
>>>>>
>>>>>> Sorry for subjectless message!!!
>>>>>>
>>>>>> Hello,
>>>>>>
>>>>>> my problem: i am trying to insert approximately 255000 documents
to a
>>>>>>
>>>>>> couchdb instance with bulk docs api. i always get invalid json
>>>>>> error.
>>>>>>
>>>>>> so i am trying to test  the problem with just one document. because
>>>>>> the error raises wether with a large file or a file with just one
>>>>>> document.
>>>>>>
>>>>>> my system:
>>>>>> couchdb: on an ubuntu server 10.04
>>>>>> client: windows 7 with cygwin curl
>>>>>>
>>>>>> $ curl -X GET http://admin:ad...<https://**groups.google.com/groups/**
>>>>>> unlock?hl=tr&_done=/group/**couchbase/browse_thread/**
>>>>>>
>>>>>> thread/7f908b186f025047%3Fhl%**3Dtr&msg=25cba4108fd1a8e8<https://groups.google.com/groups/unlock?hl=tr&_done=/group/couchbase/browse_thread/thread/7f908b186f025047%3Fhl%3Dtr&msg=25cba4108fd1a8e8>
>>>>>> @10.81.2.100:5984
>>>>>> {"couchdb":"Welcome","version"**:"1.1.0","vendor":
>>>>>> {"version":"1.2.0","name":"**Couchbase","url":"http://
>>>>>> www.couchbase.com/<http://www.**google.com/url?sa=D&q=www.**
>>>>>>
>>>>>> couchbase.com/&usg=**AFQjCNGuaH0E_Cygc_yqQqgX0s-**cmb5BuQ<http://www.google.com/url?sa=D&q=www.couchbase.com/&usg=AFQjCNGuaH0E_Cygc_yqQqgX0s-cmb5BuQ>>
>>>>>>
>>>>>> "}}
>>>>>>
>>>>>> $ curl -d @test.txt -H "Content-Type:application/**json" -X POST
>>>>>> http://admin:ad...<https://**groups.google.com/groups/**
>>>>>> unlock?hl=tr&_done=/group/**couchbase/browse_thread/**
>>>>>>
>>>>>> thread/7f908b186f025047%3Fhl%**3Dtr&msg=25cba4108fd1a8e8<https://groups.google.com/groups/unlock?hl=tr&_done=/group/couchbase/browse_thread/thread/7f908b186f025047%3Fhl%3Dtr&msg=25cba4108fd1a8e8>>
>>>>>>
>>>>>>
>>>>>> @10.81.2.100:5984/dbmerkez/_**bulk_docs<http://10.81.2.100:5984/dbmerkez/_bulk_docs>
>>>>>> {"error":"bad_request","**reason":"invalid UTF-8 JSON:<<\"\ufeff{\\
>>>>>> \"docs\\\":[{\\\"adi\\\": \\\"zeko\\\"}]}\">>"}
>>>>>>
>>>>>> now i copy the content of test.txt and paste it to my command line:
>>>>>> $ curl -d '{"docs":[{"adi": "zeko"}]}' -H "Content-Type:application/
>>>>>> json" -X POST http://admin:ad...<https://**groups.google.com/groups/**
>>>>>> unlock?hl=tr&_done=/group/**couchbase/browse_thread/**
>>>>>>
>>>>>> thread/7f908b186f025047%3Fhl%**3Dtr&msg=25cba4108fd1a8e8<https://groups.google.com/groups/unlock?hl=tr&_done=/group/couchbase/browse_thread/thread/7f908b186f025047%3Fhl%3Dtr&msg=25cba4108fd1a8e8>
>>>>>>
>>>>>> @10.81.2.100:5984/dbmerkez/_**bulk_docs<http://10.81.2.100:5984/dbmerkez/_bulk_docs>
>>>>>>
>>>>>> [{"id":"**74a5d37e71215e2095d00f90a00007**ac","rev":"1-**111c10804ee9f2b8384ab95e
>>>>>>
>>>>>> f66268e0"}]
>>>>>>
>>>>>> as you can see same content gives an invalid json error within a
file
>>>>>> but from direct command line it inserts fine.
>>>>>>
>>>>>> my text file is encoded in utf-8.
>>>>>>
>>>>>> i am so close to give up. i am fighting with this for hours. if i
can
>>>>>> not insert initial data to my instance i can not test the replication
>>>>>> cases.
>>>>>>
>>>>>> please help!!
>>>>>>
>>>>>>


Mime
View raw message