Return-Path: Delivered-To: apmail-couchdb-dev-archive@www.apache.org Received: (qmail 43591 invoked from network); 14 Dec 2008 06:19:16 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 14 Dec 2008 06:19:16 -0000 Received: (qmail 27059 invoked by uid 500); 14 Dec 2008 06:19:28 -0000 Delivered-To: apmail-couchdb-dev-archive@couchdb.apache.org Received: (qmail 27020 invoked by uid 500); 14 Dec 2008 06:19:28 -0000 Mailing-List: contact dev-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@couchdb.apache.org Delivered-To: mailing list dev@couchdb.apache.org Received: (qmail 27009 invoked by uid 99); 14 Dec 2008 06:19:28 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 13 Dec 2008 22:19:28 -0800 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of paul.joseph.davis@gmail.com designates 209.85.198.236 as permitted sender) Received: from [209.85.198.236] (HELO rv-out-0506.google.com) (209.85.198.236) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 14 Dec 2008 06:19:06 +0000 Received: by rv-out-0506.google.com with SMTP id g37so2073599rvb.35 for ; Sat, 13 Dec 2008 22:18:46 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:date:from:to :subject:in-reply-to:mime-version:content-type :content-transfer-encoding:content-disposition:references; bh=HKeyTEpgpgWUPFiT4+qPiaO4hm53i8ph0+P/VMIwCXw=; b=BaPUP/3LkSUuPBQoK6EVxCZ94Oin/liiPM0ckxScOorfJYUIcoKIvFyGrKY4CrSICp ZBe0h0gTTGdv8RR+q1ZfugeI7RanIG7fIOjImjQK9ah2jcX7cMHD8deGYZrCgsidgJF8 Icr7/6dK90KV0b1nEj/2JOz9bO1Uj5IM01WGs= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:to:subject:in-reply-to:mime-version :content-type:content-transfer-encoding:content-disposition :references; b=MoLeV67qAUwhNNfGfU1lHnaWPtk5LRIxL+OA/LMPwb1hmTwZXpFuowDeJ/gGiWmxyz nShTrDyM5f969nBXvMXl2UiBpRNjsj7sg+h/F7f9PG81DVN/ziorGcmdI4Cd7LrPIoLH F+2pBenM1YVWm4q10FLSpOj6piZwJ4+vJ0kWQ= Received: by 10.141.164.13 with SMTP id r13mr2922733rvo.152.1229235526396; Sat, 13 Dec 2008 22:18:46 -0800 (PST) Received: by 10.141.79.14 with HTTP; Sat, 13 Dec 2008 22:18:46 -0800 (PST) Message-ID: Date: Sun, 14 Dec 2008 01:18:46 -0500 From: "Paul Davis" To: dev@couchdb.apache.org Subject: Re: slash escaping (was 0.9.0 Release) In-Reply-To: <4346C67F-ADDB-4342-A776-7E4E3FDBBD4D@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline References: <0EC6A3E0-15BA-4BBB-A0A3-9ED9D04E3C40@apache.org> <20081211210620.GI26734@tumbolia.org> <4346C67F-ADDB-4342-A776-7E4E3FDBBD4D@gmail.com> X-Virus-Checked: Checked by ClamAV on apache.org I have to say I always kind of assumed that most filesystems only allowed Latin based characters in the name. I got interested so I asked the guys in the IRC channel about non-latin characters in filenames and someone actually just created a file on ext3 with japanese characters and everythign worked fine. Someone pasted this link: http://en.wikipedia.org/wiki/Comparison_of_file_systems Reading the table it appears that the biggest concern about filenames is including a NULL byte. Perhaps we're overthinking this whole thing? Maybe we can just write filenames with weird characters and the sysadmin's have to muck around with what happens when they have a design doc with weird characters? Paul On Sun, Dec 14, 2008 at 12:07 AM, Antony Blakey wrote: > > On 14/12/2008, at 2:47 PM, Chris Anderson wrote: > >> Perhaps your filename scheme could be appended to a slug (based on the >> safe-chars) so that sysadmins could still use meaningful file globs to >> eg batch rsync .couch files and view directories. > > The filename encoder can use any scheme, so yes that is trivial. It would > only be (theoretical) a prefix of the readable chars because of length > constraints. Note that there is no guarantee that slugs would be unique. I > considered punycode, but given that it needs to deal with case-insensitive > FS, slashes, limited length, it was simplest to cut to the chase and just > use the digest. > > Regarding your request however, a better way to determine safe-chars > according to the underlying filesystem is required IMO to avoid the overt > roman script-only design. If you think it's essential that *you* can read > the filenames in a terminal, then surely it's essential that a > chinese/russian/greek/swedish/thai etc developer has the same facility. > Otherwise it's not a *design requirement* per se, but rather a preference. > > I'm a pure english speaker myself, but I am about to deploy a couch system > to an asian (government) environment with many millions of users (with, BTW, > a link to CouchDB on every page). In the future I will have to sell this > technology and do technology transfer to local developers - and that is made > very much more difficult with the current vigorously asserted english-only > design decisions because it's a significant political liability. > >> Readability / globbableness is also nice when you're trying to figure >> out which views use the most space on the filesystem, a common task. > > That's why the actual name is in the 'name' file. > > Antony Blakey > ------------- > CTO, Linkuistics Pty Ltd > Ph: 0438 840 787 > > There are two ways of constructing a software design: One way is to make it > so simple that there are obviously no deficiencies, and the other way is to > make it so complicated that there are no obvious deficiencies. > -- C. A. R. Hoare > > >