Return-Path: X-Original-To: apmail-stdcxx-dev-archive@www.apache.org Delivered-To: apmail-stdcxx-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 803AAD26E for ; Thu, 25 Oct 2012 15:32:18 +0000 (UTC) Received: (qmail 64639 invoked by uid 500); 25 Oct 2012 15:32:18 -0000 Delivered-To: apmail-stdcxx-dev-archive@stdcxx.apache.org Received: (qmail 64584 invoked by uid 500); 25 Oct 2012 15:32:17 -0000 Mailing-List: contact dev-help@stdcxx.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@stdcxx.apache.org Delivered-To: mailing list dev@stdcxx.apache.org Received: (qmail 64574 invoked by uid 99); 25 Oct 2012 15:32:17 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 25 Oct 2012 15:32:17 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of msebor@gmail.com designates 209.85.214.182 as permitted sender) Received: from [209.85.214.182] (HELO mail-ob0-f182.google.com) (209.85.214.182) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 25 Oct 2012 15:32:08 +0000 Received: by mail-ob0-f182.google.com with SMTP id wc20so1710101obb.41 for ; Thu, 25 Oct 2012 08:31:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=message-id:date:from:user-agent:mime-version:to:cc:subject :references:in-reply-to:content-type:content-transfer-encoding; bh=/yje3wERU2wRlMb4efS9W2qAG8W8mMxzgjNm2E2htzU=; b=TEGJHjGYri4dkWjJaNXuypjxjrIunlIkviRFYp7vjFXhnsHw5NiskGHUt3tD3Tb+oP eWLf3V17Oga/Sf7xkskNpuo9EqEVYeE7NtXCVFkVSQEcY4NzP365vScCBW/UqZx2BB2O PBE95IvJnEdSDoaAv8dbPzTOrMwm9JtspDakkY+3JKuV1Ck9XFqHq3Se8puFou+qILqh GHP7oWlinbjZiUOJ8c/wa66IsfUs2X6xbAx784+QO/EpVAKcmEqzi07hht7QR2xPqpH5 1agiEIuat6i0u3HcPKfDlJeR1/4ekyK2Sc+FkzfKtReKNVdudu/OgRwVfKBYR/7TIvl4 H+zg== Received: by 10.60.26.72 with SMTP id j8mr17331601oeg.68.1351179107079; Thu, 25 Oct 2012 08:31:47 -0700 (PDT) Received: from localhost.localdomain (72-163-0-129.cisco.com. [72.163.0.129]) by mx.google.com with ESMTPS id b5sm18730449obd.18.2012.10.25.08.31.45 (version=TLSv1/SSLv3 cipher=OTHER); Thu, 25 Oct 2012 08:31:46 -0700 (PDT) Message-ID: <50895B60.9000806@gmail.com> Date: Thu, 25 Oct 2012 09:31:44 -0600 From: Martin Sebor User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:11.0) Gecko/20120329 Thunderbird/11.0.1 MIME-Version: 1.0 To: dev@stdcxx.apache.org CC: Liviu Nicoara Subject: Re: [PATCH] STDCXX-1073 References: <50756924.6070802@hates.ms> <507985CD.1050804@hates.ms> <507D8D9B.9000307@hates.ms> <50801A02.4080608@gmail.com> <50808294.9090103@hates.ms> <5084807A.2010606@gmail.com> <5087E93D.3030306@hates.ms> <50880528.20409@gmail.com> <50892F13.3090807@hates.ms> In-Reply-To: <50892F13.3090807@hates.ms> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org On 10/25/2012 06:22 AM, Liviu Nicoara wrote: > On 10/24/12 11:11, Martin Sebor wrote: >> On 10/24/2012 07:12 AM, Liviu Nicoara wrote: >>> [...] >>> I modified the test according to the suggestions. The test fails all >>> corresponding wide-char cases and I am investigating other potential >>> defects as well. For example, I do not think that employing strcoll and >>> wcscoll in compare is correct as they stop at the first NUL, although >>> strings may contain characters after the NUL that alter the result of >>> the comparison. >> >> I would expect the wchar_t specialization to be analogous >> to the narrow one. In fact (without looking at the code), >> I would even think both could be implemented in terms of >> the same function template specialized on the character >> type and on the libc string function. (Although I'm not >> necessarily suggesting this as the solution to this issue.) >> > > They are not similar, AFAICT. In revision 367462 you implemented the > wide byname do_compare in terms of wcscoll, if available, when using > libc. That implementation does not take into account embedded NULs. This > is in contrast with the narrow byname which simply transforms the > strings and compares them. IIUC, rev 367462 actually implements the function in terms of do_transform() when wcscoll() isn't available as a workaround. Ironically, the workaround is actually better than the default implementation. > > Testing shows that an implementation of wide byname do_compare identical > with the narrow version passes all tests, as expected. > > It's easy to me to just remove the wcscoll-based implementation. Also, > error reporting from wcscoll seems difficult to use when libc does not > have thread-local errno's, and right now we don't check for errors. > OTOH, I expect wcscoll to be faster than the (simpler) transformation > followed by comparison. > > I incline towards the simpler approach. Thoughts? There are comments suggesting that calling do_transform() on the whole string may be suboptimal. Intuitively it makes sense that calling wcscoll() (in a loop, on the NUL-terminated substrings, if necessary) should be faster than simply calling do_transform() followed by wstring::compare(), but it would make sense to confirm the hypothesis before implementing the optimization. Martin > > Liviu