Return-Path: Delivered-To: apmail-couchdb-user-archive@www.apache.org Received: (qmail 29813 invoked from network); 1 Jul 2009 19:49:25 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 1 Jul 2009 19:49:25 -0000 Received: (qmail 35554 invoked by uid 500); 1 Jul 2009 19:49:35 -0000 Delivered-To: apmail-couchdb-user-archive@couchdb.apache.org Received: (qmail 35484 invoked by uid 500); 1 Jul 2009 19:49:35 -0000 Mailing-List: contact user-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@couchdb.apache.org Delivered-To: mailing list user@couchdb.apache.org Received: (qmail 35474 invoked by uid 99); 1 Jul 2009 19:49:35 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 01 Jul 2009 19:49:35 +0000 X-ASF-Spam-Status: No, hits=0.5 required=10.0 tests=RCVD_IN_DNSWL_LOW,SPF_PASS,WEIRD_PORT X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: local policy) Received: from [66.111.4.28] (HELO out4.smtp.messagingengine.com) (66.111.4.28) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 01 Jul 2009 19:49:24 +0000 Received: from compute2.internal (compute2.internal [10.202.2.42]) by out1.messagingengine.com (Postfix) with ESMTP id 95DD6387606; Wed, 1 Jul 2009 15:49:03 -0400 (EDT) Received: from heartbeat1.messagingengine.com ([10.202.2.160]) by compute2.internal (MEProxy); Wed, 01 Jul 2009 15:49:04 -0400 X-Sasl-enc: nZu/G76WYfxlF9FnrP+ZIemMl55GkJKyQcvQCwrZ8+RI 1246477742 Received: from NitinBorwankarsComputer.local (c-71-202-180-50.hsd1.ca.comcast.net [71.202.180.50]) by mail.messagingengine.com (Postfix) with ESMTPA id D28A63EC3B for ; Wed, 1 Jul 2009 15:49:02 -0400 (EDT) Message-ID: <4A4BBDAE.9010002@borwankar.com> Date: Wed, 01 Jul 2009 12:49:02 -0700 From: Nitin Borwankar User-Agent: Thunderbird 2.0.0.22 (Macintosh/20090605) MIME-Version: 1.0 To: user@couchdb.apache.org Subject: Re: chunked encoding problem ? - error messages from curl as well as lucene References: <921000906292111x5ca5cbbbh9b7b3aa7e9d123dc@mail.gmail.com> <921000906292115i77cbb18fhf0a5ab87681a53f0@mail.gmail.com> <224A41AE-3A54-495B-A810-805C057FBE95@apache.org> <921000906301118y3bb11937hf6c0cd433387e05d@mail.gmail.com> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org Chris Anderson wrote: > On Tue, Jun 30, 2009 at 8:18 PM, Nitin Borwankar wrote: > >> Hi Damien, >> >> Thanks for that tip. >> >> Turns out I had non-UTF-8 data >> >> adolfo.steiger-gar%E7%E3o: >> >> - not sure how it managed to get into the db. >> >> This is probably confusing the chunk termination. >> >> How did Couch let this data in ? >> > > Currently CouchDB doesn't validate json string contents on input, only > on output. > > Adding an option to block invalid unicode input would be a small > patch, but perhaps slow things down as we'd have to spend more time in > the encoder while writing. Worth measuring I suppose. > > To think about it from the user's point of view - data probably gets read more often than written. So if you validate it when putting it in, you're saving a whole bunch of unnecessary validations on reading. It seems backward ( I am probably missing something huge) to not validate it on input. If you catch all the bad stuff going in you're less likely (except when you're doing internal transformations) to have bad stuff there in the first place and you can save yourself validations on the way out. Nitin > Is this something users are running into a lot? I've heard this once > before, if lots of people are seeing this, it's definitely worthy of > fixing. > > > I uploaded via Python httplib - not > >> couchdb-python. Is this a bug - the one that is fixed in 0.9.1? >> >> Nitin >> >> 37% of all statistics are made up on the spot >> ------------------------------------------------------------------------------------- >> Nitin Borwankar >> nborwankar@gmail.com >> >> >> On Tue, Jun 30, 2009 at 8:58 AM, Damien Katz wrote: >> >> >>> This might be the json encoding issue that Adam fixed. >>> >>> The 0.9.x branch, which is soon to be 0.9.1, fixes that issue. Try building >>> and installing from the branch and see if that fixes the problem: >>> svn co http://svn.apache.org/repos/asf/couchdb/branches/0.9.x/ >>> >>> -Damien >>> >>> >>> >>> On Jun 30, 2009, at 12:15 AM, Nitin Borwankar wrote: >>> >>> Oh and when I use Futon and try to browse the docs around where curl >>> >>>> gives >>>> an error, when I hit the page containing the records around the error >>>> Futon >>>> just spins and doesn't render the page. >>>> >>>> Data corruption? >>>> >>>> Nitin >>>> >>>> 37% of all statistics are made up on the spot >>>> >>>> ------------------------------------------------------------------------------------- >>>> Nitin Borwankar >>>> nborwankar@gmail.com >>>> >>>> >>>> On Mon, Jun 29, 2009 at 9:11 PM, Nitin Borwankar >>> >>>>> wrote: >>>>> >>>> >>>>> Hi, >>>>> >>>>> I uploaded about 11K + docs total 230MB or so of data to a 0.9 instance >>>>> on >>>>> Ubuntu. >>>>> Db name is 'plist' >>>>> >>>>> curl http://localhost:5984/plist gives >>>>> >>>>> >>>>> >>>>> {"db_name":"plist","doc_count":11036,"doc_del_count":0,"update_seq":11036,"purge_seq":0, >>>>> >>>>> >>>>> "compact_running":false,"disk_size":243325178,"instance_start_time":"1246228896723181"} >>>>> >>>>> suggesting a non-corrupt db >>>>> >>>>> curl http://localhost:5984/plist/_all_docs gives >>>>> >>>>> {"id":"adnanmoh","key":"adnanmoh","value":{"rev":"1-663736558"}}, >>>>> >>>>> >>>>> {"id":"adnen.chockri","key":"adnen.chockri","value":{"rev":"1-1209124545"}}, >>>>> curl: (56) Received problem 2 in the chunky >>>>> parser <<--------- note curl >>>>> error >>>>> {"id":"ado.adamu","key":"ado.adamu","value":{"rev":"1-4226951654"}} >>>>> >>>>> suggesting a chunked data transfer error >>>>> >>>>> >>>>> couchdb-lucene error message in couchdb.stderr reads >>>>> >>>>> [...] >>>>> >>>>> [couchdb-lucene] INFO Indexing plist from scratch. >>>>> [couchdb-lucene] ERROR Error updating index. >>>>> java.io.IOException: CRLF expected at end of chunk: 83/101 >>>>> at >>>>> >>>>> org.apache.commons.httpclient.ChunkedInputStream.readCRLF(ChunkedInputStream.java:207) >>>>> at >>>>> >>>>> org.apache.commons.httpclient.ChunkedInputStream.nextChunk(ChunkedInputStream.java:219) >>>>> at >>>>> >>>>> org.apache.commons.httpclient.ChunkedInputStream.read(ChunkedInputStream.java:176) >>>>> at >>>>> >>>>> org.apache.commons.httpclient.ChunkedInputStream.read(ChunkedInputStream.java:196) >>>>> at >>>>> >>>>> org.apache.commons.httpclient.ChunkedInputStream.exhaustInputStream(ChunkedInputStream.java:369) >>>>> at >>>>> >>>>> org.apache.commons.httpclient.ChunkedInputStream.close(ChunkedInputStream.java:346) >>>>> at java.io.FilterInputStream.close(FilterInputStream.java:159) >>>>> at >>>>> >>>>> org.apache.commons.httpclient.AutoCloseInputStream.notifyWatcher(AutoCloseInputStream.java:194) >>>>> at >>>>> >>>>> org.apache.commons.httpclient.AutoCloseInputStream.close(AutoCloseInputStream.java:158) >>>>> at >>>>> com.github.rnewson.couchdb.lucene.Database.execute(Database.java:141) >>>>> at com.github.rnewson.couchdb.lucene.Database.get(Database.java:107) >>>>> at >>>>> >>>>> com.github.rnewson.couchdb.lucene.Database.getAllDocsBySeq(Database.java:82) >>>>> at >>>>> >>>>> com.github.rnewson.couchdb.lucene.Index$Indexer.updateDatabase(Index.java:229) >>>>> at >>>>> >>>>> com.github.rnewson.couchdb.lucene.Index$Indexer.updateIndex(Index.java:178) >>>>> at com.github.rnewson.couchdb.lucene.Index$Indexer.run(Index.java:90) >>>>> at java.lang.Thread.run(Thread.java:595) >>>>> >>>>> >>>>> suggesting a chunking problem again. >>>>> >>>>> Who is creating this problem - my data? CouchDB chunking ? >>>>> >>>>> Help? >>>>> >>>>> >>>>> >>>>> 37% of all statistics are made up on the spot >>>>> >>>>> >>>>> ------------------------------------------------------------------------------------- >>>>> Nitin Borwankar >>>>> nborwankar@gmail.com >>>>> >>>>> >>>>> > > > >