www-legal-discuss mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Benson Margulies <bimargul...@gmail.com>
Subject Re: Fair-use data in svn
Date Fri, 05 Nov 2010 10:37:18 GMT

What I think we've established here is that a certain category of NLP
tasks can't really be undertaken at Apache in the usual way. I'm not
saying that this the end of the world or that it's not worthwhile to
try to undertake them in some other way.

The NLP research community has 'been there and done that' in terms of
trying to clear rights to corpora. It's not necessarily impossible in
all cases, but it's not by any means guaranteed to be possible when
you need it to be possible.

It's an interesting limit, perhaps, on open source: as a commercial
enterprise, I use a spider and grab all the visible content of the
web, with no regard for copyright, and so long as I don't turn around
and publish that text, I have essentially no legal exposure. I can do
statistics on it, train models on it, etc. Perhaps a content
publisher, if they knew that I had used a large amount of their data,
would take issue and ask me to pay something, and then perhaps we'd
have a discussion of fair use, or perhaps we'd pay.

For the immediate project I'm working on, I'll just push it to github
after making my own personal (or corporate) determination of legal
risk of being accused of unfair use of a bag of web pages, in a
compressed tar file, is in a public source control repository. For the
proposed OpenNLP podling, this will put some boundaries on them, but
they might be happy to only check in code and 'cleared' corpora, and
leave it to their users to apply the code to more interesting corpora.


On Fri, Nov 5, 2010 at 5:15 AM, Sim IJskes <sijskes@apache.org> wrote:
> On 11/05/2010 09:56 AM, Jukka Zitting wrote:
>> Hi,
>> On Fri, Nov 5, 2010 at 10:07 AM, Sim IJskes<sijskes@apache.org>  wrote:
>>> Wouldn't data publicly accesible in jira be just another case of
>>> redistribution? And by this falling within the scope of copyright
>>> in many jurisdictions?
>> Sure, but the "purpose and character" of a Jira attachment is much
>> more limited than that of an official Apache release. Plus the need
>> for explicitly documenting the licensing status is much more relaxed.
>> We have lots of non-licensed Jira attachments that (at least to my
>> layman mind) clearly fall within fair use for research purposes.
> I'm a layman;
> Isn't the distinction here that we are not talking about an original
> contribution, made by the author, but with an artifact that is nothing more
> then an aggregation of public available material? In the jurisdiction i live
> under (The Netherlands), this will expose you to legal actions. If you want
> to know more, look at the 'Knipselkrant-arrest'.
> Gr. Sim
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: legal-discuss-unsubscribe@apache.org
> For additional commands, e-mail: legal-discuss-help@apache.org

To unsubscribe, e-mail: legal-discuss-unsubscribe@apache.org
For additional commands, e-mail: legal-discuss-help@apache.org

View raw message