www-legal-discuss mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ross Gardler <rgard...@apache.org>
Subject Re: Fair-use data in svn
Date Fri, 05 Nov 2010 12:43:17 GMT
Does it have to be CNN? if it is News you want how about WikiNews?

http://en.wikinews.org/wiki/Main_Page

Ross

Sent from my mobile device.

On 5 Nov 2010, at 06:37, Benson Margulies <bimargulies@gmail.com> wrote:

> Folks,
> 
> What I think we've established here is that a certain category of NLP
> tasks can't really be undertaken at Apache in the usual way. I'm not
> saying that this the end of the world or that it's not worthwhile to
> try to undertake them in some other way.
> 
> The NLP research community has 'been there and done that' in terms of
> trying to clear rights to corpora. It's not necessarily impossible in
> all cases, but it's not by any means guaranteed to be possible when
> you need it to be possible.
> 
> It's an interesting limit, perhaps, on open source: as a commercial
> enterprise, I use a spider and grab all the visible content of the
> web, with no regard for copyright, and so long as I don't turn around
> and publish that text, I have essentially no legal exposure. I can do
> statistics on it, train models on it, etc. Perhaps a content
> publisher, if they knew that I had used a large amount of their data,
> would take issue and ask me to pay something, and then perhaps we'd
> have a discussion of fair use, or perhaps we'd pay.
> 
> For the immediate project I'm working on, I'll just push it to github
> after making my own personal (or corporate) determination of legal
> risk of being accused of unfair use of a bag of web pages, in a
> compressed tar file, is in a public source control repository. For the
> proposed OpenNLP podling, this will put some boundaries on them, but
> they might be happy to only check in code and 'cleared' corpora, and
> leave it to their users to apply the code to more interesting corpora.
> 
> --benson
> 
> 
> On Fri, Nov 5, 2010 at 5:15 AM, Sim IJskes <sijskes@apache.org> wrote:
>> On 11/05/2010 09:56 AM, Jukka Zitting wrote:
>>> 
>>> Hi,
>>> 
>>> On Fri, Nov 5, 2010 at 10:07 AM, Sim IJskes<sijskes@apache.org>  wrote:
>>>> 
>>>> Wouldn't data publicly accesible in jira be just another case of
>>>> redistribution? And by this falling within the scope of copyright
>>>> in many jurisdictions?
>>> 
>>> Sure, but the "purpose and character" of a Jira attachment is much
>>> more limited than that of an official Apache release. Plus the need
>>> for explicitly documenting the licensing status is much more relaxed.
>>> We have lots of non-licensed Jira attachments that (at least to my
>>> layman mind) clearly fall within fair use for research purposes.
>> 
>> I'm a layman;
>> 
>> Isn't the distinction here that we are not talking about an original
>> contribution, made by the author, but with an artifact that is nothing more
>> then an aggregation of public available material? In the jurisdiction i live
>> under (The Netherlands), this will expose you to legal actions. If you want
>> to know more, look at the 'Knipselkrant-arrest'.
>> 
>> Gr. Sim
>> 
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: legal-discuss-unsubscribe@apache.org
>> For additional commands, e-mail: legal-discuss-help@apache.org
>> 
>> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: legal-discuss-unsubscribe@apache.org
> For additional commands, e-mail: legal-discuss-help@apache.org
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: legal-discuss-unsubscribe@apache.org
For additional commands, e-mail: legal-discuss-help@apache.org


Mime
View raw message