incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nitin Borwankar <ni...@borwankar.com>
Subject Re: chunked encoding problem ? - error messages from curl as well as lucene
Date Wed, 01 Jul 2009 19:49:02 GMT
Chris Anderson wrote:
> On Tue, Jun 30, 2009 at 8:18 PM, Nitin Borwankar<nitin@borwankar.com> wrote:
>   
>> Hi Damien,
>>
>> Thanks for that tip.
>>
>> Turns out I had non-UTF-8 data
>>
>> adolfo.steiger-gar%E7%E3o:
>>
>> - not sure how it managed to get into the db.
>>
>> This is probably confusing the chunk termination.
>>
>> How did Couch let this data in ?
>>     
>
> Currently CouchDB doesn't validate json string contents on input, only
> on output.
>
> Adding an option to block invalid unicode input would be a small
> patch, but perhaps slow things down as we'd have to spend more time in
> the encoder while writing. Worth measuring I suppose.
>
>   

To think about it from the user's point of view - data probably gets 
read more often than written.  So if you validate it when putting it in, 
you're saving a whole bunch of unnecessary validations on reading.

It seems backward ( I am probably missing something huge) to not 
validate it on input. If you catch all the bad stuff going in you're 
less likely (except when you're doing internal transformations) to have 
bad stuff there in the first place and you can save yourself validations 
on the way out.

Nitin

> Is this something users are running into a lot? I've heard this once
> before, if lots of people are seeing this, it's definitely worthy of
> fixing.
>
>   




>   I uploaded via Python httplib - not
>   
>> couchdb-python.  Is this a bug - the one that is fixed in 0.9.1?
>>
>> Nitin
>>
>> 37% of all statistics are made up on the spot
>> -------------------------------------------------------------------------------------
>> Nitin Borwankar
>> nborwankar@gmail.com
>>
>>
>> On Tue, Jun 30, 2009 at 8:58 AM, Damien Katz <damien@apache.org> wrote:
>>
>>     
>>> This might be the json encoding issue that Adam fixed.
>>>
>>> The 0.9.x branch, which is soon to be 0.9.1, fixes that issue. Try building
>>> and installing from the branch and see if that fixes the problem:
>>> svn co http://svn.apache.org/repos/asf/couchdb/branches/0.9.x/
>>>
>>> -Damien
>>>
>>>
>>>
>>> On Jun 30, 2009, at 12:15 AM, Nitin Borwankar wrote:
>>>
>>>  Oh and when I  use Futon and try to browse the docs around where curl
>>>       
>>>> gives
>>>> an error,  when I hit the page containing the records around the error
>>>> Futon
>>>> just spins and doesn't render the page.
>>>>
>>>> Data corruption?
>>>>
>>>> Nitin
>>>>
>>>> 37% of all statistics are made up on the spot
>>>>
>>>> -------------------------------------------------------------------------------------
>>>> Nitin Borwankar
>>>> nborwankar@gmail.com
>>>>
>>>>
>>>> On Mon, Jun 29, 2009 at 9:11 PM, Nitin Borwankar <nitin@borwankar.com
>>>>         
>>>>> wrote:
>>>>>           
>>>>         
>>>>> Hi,
>>>>>
>>>>> I uploaded about 11K + docs total 230MB or so of data to a 0.9 instance
>>>>> on
>>>>> Ubuntu.
>>>>> Db name is 'plist'
>>>>>
>>>>> curl http://localhost:5984/plist gives
>>>>>
>>>>>
>>>>>
>>>>> {"db_name":"plist","doc_count":11036,"doc_del_count":0,"update_seq":11036,"purge_seq":0,
>>>>>
>>>>>
>>>>> "compact_running":false,"disk_size":243325178,"instance_start_time":"1246228896723181"}
>>>>>
>>>>> suggesting a non-corrupt db
>>>>>
>>>>> curl http://localhost:5984/plist/_all_docs gives
>>>>>
>>>>> {"id":"adnanmoh","key":"adnanmoh","value":{"rev":"1-663736558"}},
>>>>>
>>>>>
>>>>> {"id":"adnen.chockri","key":"adnen.chockri","value":{"rev":"1-1209124545"}},
>>>>> curl: (56) Received problem 2 in the chunky
>>>>> parser                                          <<--------- note
curl
>>>>> error
>>>>> {"id":"ado.adamu","key":"ado.adamu","value":{"rev":"1-4226951654"}}
>>>>>
>>>>> suggesting a chunked data transfer error
>>>>>
>>>>>
>>>>> couchdb-lucene error message in couchdb.stderr reads
>>>>>
>>>>> [...]
>>>>>
>>>>> [couchdb-lucene] INFO Indexing plist from scratch.
>>>>> [couchdb-lucene] ERROR Error updating index.
>>>>> java.io.IOException: CRLF expected at end of chunk: 83/101
>>>>>   at
>>>>>
>>>>> org.apache.commons.httpclient.ChunkedInputStream.readCRLF(ChunkedInputStream.java:207)
>>>>>   at
>>>>>
>>>>> org.apache.commons.httpclient.ChunkedInputStream.nextChunk(ChunkedInputStream.java:219)
>>>>>   at
>>>>>
>>>>> org.apache.commons.httpclient.ChunkedInputStream.read(ChunkedInputStream.java:176)
>>>>>   at
>>>>>
>>>>> org.apache.commons.httpclient.ChunkedInputStream.read(ChunkedInputStream.java:196)
>>>>>   at
>>>>>
>>>>> org.apache.commons.httpclient.ChunkedInputStream.exhaustInputStream(ChunkedInputStream.java:369)
>>>>>   at
>>>>>
>>>>> org.apache.commons.httpclient.ChunkedInputStream.close(ChunkedInputStream.java:346)
>>>>>   at java.io.FilterInputStream.close(FilterInputStream.java:159)
>>>>>   at
>>>>>
>>>>> org.apache.commons.httpclient.AutoCloseInputStream.notifyWatcher(AutoCloseInputStream.java:194)
>>>>>   at
>>>>>
>>>>> org.apache.commons.httpclient.AutoCloseInputStream.close(AutoCloseInputStream.java:158)
>>>>>   at
>>>>> com.github.rnewson.couchdb.lucene.Database.execute(Database.java:141)
>>>>>   at com.github.rnewson.couchdb.lucene.Database.get(Database.java:107)
>>>>>   at
>>>>>
>>>>> com.github.rnewson.couchdb.lucene.Database.getAllDocsBySeq(Database.java:82)
>>>>>   at
>>>>>
>>>>> com.github.rnewson.couchdb.lucene.Index$Indexer.updateDatabase(Index.java:229)
>>>>>   at
>>>>>
>>>>> com.github.rnewson.couchdb.lucene.Index$Indexer.updateIndex(Index.java:178)
>>>>>   at com.github.rnewson.couchdb.lucene.Index$Indexer.run(Index.java:90)
>>>>>   at java.lang.Thread.run(Thread.java:595)
>>>>>
>>>>>
>>>>> suggesting a chunking problem again.
>>>>>
>>>>> Who is creating this problem - my data?  CouchDB chunking ?
>>>>>
>>>>> Help?
>>>>>
>>>>>
>>>>>
>>>>> 37% of all statistics are made up on the spot
>>>>>
>>>>>
>>>>> -------------------------------------------------------------------------------------
>>>>> Nitin Borwankar
>>>>> nborwankar@gmail.com
>>>>>
>>>>>
>>>>>           
>
>
>
>   


Mime
View raw message