From couchdb-user-return-20-apmail-incubator-couchdb-user-archive=incubator.apache.org@incubator.apache.org Mon Mar 03 22:37:07 2008 Return-Path: Delivered-To: apmail-incubator-couchdb-user-archive@locus.apache.org Received: (qmail 45739 invoked from network); 3 Mar 2008 22:37:07 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 3 Mar 2008 22:37:07 -0000 Received: (qmail 52435 invoked by uid 500); 3 Mar 2008 22:37:02 -0000 Delivered-To: apmail-incubator-couchdb-user-archive@incubator.apache.org Received: (qmail 52418 invoked by uid 500); 3 Mar 2008 22:37:02 -0000 Mailing-List: contact couchdb-user-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: couchdb-user@incubator.apache.org Delivered-To: mailing list couchdb-user@incubator.apache.org Received: (qmail 52409 invoked by uid 99); 3 Mar 2008 22:37:02 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 03 Mar 2008 14:37:02 -0800 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of guby.mail@gmail.com designates 66.249.82.225 as permitted sender) Received: from [66.249.82.225] (HELO wx-out-0506.google.com) (66.249.82.225) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 03 Mar 2008 22:36:25 +0000 Received: by wx-out-0506.google.com with SMTP id h30so445550wxd.21 for ; Mon, 03 Mar 2008 14:36:35 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:from:to:in-reply-to:content-type:content-transfer-encoding:mime-version:subject:date:references:x-mailer; bh=V4J/wMHa7gzn6H7esZubeFYiR4W2jlVf21taeg2MzbM=; b=pDsiUfHlQDseE90McSC+c4QA9bWRkY0nbBeqpm1BxBXRxXvqwy8ALI2uifzsD0g/TJ/76QlkqrCKF4VGkuAKw8idsoRYlLK07/jKGOrzrlOUwVLexpl7371WxG2PLZrhynBlKwSmvHD48TtOm1hrPIgrG4JdMvO5FVgozws46dE= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:from:to:in-reply-to:content-type:content-transfer-encoding:mime-version:subject:date:references:x-mailer; b=ZoAZiEBsrXuE13CM+SFIFV08tc8N3c2djKv5tzrBBMl8W41LBVpyGgmb+4lqx7JHLLMaXjIL4fyEEhW8rEPJZJA5yj73v+bcPpZw8WVFwy1Xpi2GOcYFnHo6YcHWHhJKG2j9IZKLbCQ9qfHGUqQZ/b5/meBA+cuzLVY1uu2jh4w= Received: by 10.101.66.14 with SMTP id t14mr221027ank.114.1204583794956; Mon, 03 Mar 2008 14:36:34 -0800 (PST) Received: from ?192.168.1.100? ( [201.231.210.97]) by mx.google.com with ESMTPS id c23sm1496747ana.15.2008.03.03.14.36.33 (version=TLSv1/SSLv3 cipher=OTHER); Mon, 03 Mar 2008 14:36:34 -0800 (PST) Message-Id: <19AF5534-D1F0-4268-AA37-F19297D9FB3E@gmail.com> From: Guby To: couchdb-user@incubator.apache.org In-Reply-To: Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes Content-Transfer-Encoding: 7bit Mime-Version: 1.0 (Apple Message framework v919.2) Subject: Re: URL as document_id Date: Mon, 3 Mar 2008 20:36:02 -0300 References: X-Mailer: Apple Mail (2.919.2) X-Virus-Checked: Checked by ClamAV on apache.org Hi Chris When using URLs as IDs you will get in trouble when you have URLs that contain question marks! I don't get it to work anyhow. I think it might be related to the errors I get when using keys, when querying views, that contain question marks or = or &. I actually sent a message about this to the list yesterday and Neil suggested that there might be an error in src/CouchDB/mod_couch.erl. where this command is called: case regexp:split(RequestUri, "\\?") of Hope that helps. Best regards G On Mar 3, 2008, at 6:17 PM, Chris Anderson wrote: > Hello all. > > I'm planning to store the results of a web-crawl in CouchDB, and want > to use the page urls as document_ids. I understand that I can get the > same uniq identifier constraints by using an MD5 of the url, but the > raw URL appeals to me. > > The only downside to using a URL as the document_id, is that they can > contain a wide set of characters, and can be quite long. It's not > clear from the wiki if there are any practical limitations on > document_ids -- I'm hoping that gives the go-ahead for me to just pour > raw web sewage (URLs) into CouchDB document_ids. > > Thanks for any advice/warnings, > Chris > > -- > Chris Anderson > http://jchris.mfdz.com