Return-Path: Delivered-To: apmail-legal-discuss-archive@www.apache.org Received: (qmail 31844 invoked from network); 5 Nov 2010 10:37:18 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 5 Nov 2010 10:37:18 -0000 Received: (qmail 86947 invoked by uid 500); 5 Nov 2010 10:37:49 -0000 Delivered-To: apmail-legal-discuss-archive@apache.org Received: (qmail 86623 invoked by uid 500); 5 Nov 2010 10:37:47 -0000 Mailing-List: contact legal-discuss-help@apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: Reply-To: legal-discuss@apache.org List-Id: Delivered-To: mailing list legal-discuss@apache.org Received: (qmail 86616 invoked by uid 99); 5 Nov 2010 10:37:46 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 05 Nov 2010 10:37:46 +0000 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests=FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of bimargulies@gmail.com designates 209.85.214.50 as permitted sender) Received: from [209.85.214.50] (HELO mail-bw0-f50.google.com) (209.85.214.50) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 05 Nov 2010 10:37:38 +0000 Received: by bwz17 with SMTP id 17so2591871bwz.23 for ; Fri, 05 Nov 2010 03:37:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:in-reply-to :references:date:message-id:subject:from:to:content-type :content-transfer-encoding; bh=k7s3QOq0PmPSABMftv2xILp+IzK2vkmM8KrWeSlC7AI=; b=esCP1wn0zYndic9r7XvfP0GSpvtT1H/PMzHx8DCs2ybcqs+Z4U+sqHIGUWjMpAigay a8lY4NlAhRRG6EEP7hFJhviZ6uwUO0XwR5JNyYqHoadWy9jKJo0KjVDniEi0Z/CslOKN jMfvjSAKiVRZRCtK/ur/s/icDojv2/3uMFv2A= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:content-transfer-encoding; b=QJSGUE8vUdmViHlTGVCwWLw4kUwV9kjfKoc2CNQu1be0683dJ90+cSJeiDLUeY3A2t WXredydhVQGSkF3c/A3UTdX+YxEPQ/t+yrfyoniDC62bHx1Hs5HyJLNvjwbauK3WHpZC BsZWIJ/+jHrmnrXqeNrT6Ikxq3YWgYoMxH1L4= MIME-Version: 1.0 Received: by 10.204.71.209 with SMTP id i17mr1707480bkj.185.1288953438374; Fri, 05 Nov 2010 03:37:18 -0700 (PDT) Received: by 10.204.78.79 with HTTP; Fri, 5 Nov 2010 03:37:18 -0700 (PDT) In-Reply-To: <4CD3CB3F.2020303@apache.org> References: <4CD3BB4A.2000406@apache.org> <4CD3CB3F.2020303@apache.org> Date: Fri, 5 Nov 2010 06:37:18 -0400 Message-ID: Subject: Re: Fair-use data in svn From: Benson Margulies To: legal-discuss@apache.org Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org Folks, What I think we've established here is that a certain category of NLP tasks can't really be undertaken at Apache in the usual way. I'm not saying that this the end of the world or that it's not worthwhile to try to undertake them in some other way. The NLP research community has 'been there and done that' in terms of trying to clear rights to corpora. It's not necessarily impossible in all cases, but it's not by any means guaranteed to be possible when you need it to be possible. It's an interesting limit, perhaps, on open source: as a commercial enterprise, I use a spider and grab all the visible content of the web, with no regard for copyright, and so long as I don't turn around and publish that text, I have essentially no legal exposure. I can do statistics on it, train models on it, etc. Perhaps a content publisher, if they knew that I had used a large amount of their data, would take issue and ask me to pay something, and then perhaps we'd have a discussion of fair use, or perhaps we'd pay. For the immediate project I'm working on, I'll just push it to github after making my own personal (or corporate) determination of legal risk of being accused of unfair use of a bag of web pages, in a compressed tar file, is in a public source control repository. For the proposed OpenNLP podling, this will put some boundaries on them, but they might be happy to only check in code and 'cleared' corpora, and leave it to their users to apply the code to more interesting corpora. --benson On Fri, Nov 5, 2010 at 5:15 AM, Sim IJskes wrote: > On 11/05/2010 09:56 AM, Jukka Zitting wrote: >> >> Hi, >> >> On Fri, Nov 5, 2010 at 10:07 AM, Sim IJskes =C2=A0wr= ote: >>> >>> Wouldn't data publicly accesible in jira be just another case of >>> redistribution? And by this falling within the scope of copyright >>> in many jurisdictions? >> >> Sure, but the "purpose and character" of a Jira attachment is much >> more limited than that of an official Apache release. Plus the need >> for explicitly documenting the licensing status is much more relaxed. >> We have lots of non-licensed Jira attachments that (at least to my >> layman mind) clearly fall within fair use for research purposes. > > I'm a layman; > > Isn't the distinction here that we are not talking about an original > contribution, made by the author, but with an artifact that is nothing mo= re > then an aggregation of public available material? In the jurisdiction i l= ive > under (The Netherlands), this will expose you to legal actions. If you wa= nt > to know more, look at the 'Knipselkrant-arrest'. > > Gr. Sim > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: legal-discuss-unsubscribe@apache.org > For additional commands, e-mail: legal-discuss-help@apache.org > > --------------------------------------------------------------------- To unsubscribe, e-mail: legal-discuss-unsubscribe@apache.org For additional commands, e-mail: legal-discuss-help@apache.org