www-legal-discuss mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Santiago Gala <santiago.g...@gmail.com>
Subject Re: Fair-use data in svn
Date Fri, 05 Nov 2010 08:21:20 GMT
CNN can probably give permission to use a set of 6 month old "regular" news
for such purpose. If contacted through their PR people you could 'pay' with
words about how important this is, or a joint release about them helping
research (but talk with press@ before assuming it)
El 05/11/2010 01:08, "Benson Margulies" <bimargulies@gmail.com> escribió:
> On Thu, Nov 4, 2010 at 7:47 PM, Lawrence Rosen <lrosen@rosenlaw.com>
wrote:
>> Benson, how about copying materials that are explicitly marked "Creative
Commons"? There must be enough of that stuff on the web to collect into a
test case.
>
> Here's a concrete example. Let's say that the job at hand is to
> extract useful text from webpages. You need to test this on the news
> sites that people want to work with, like CNN. The inventory of
> 'Commons' pages is not representative.
>
> Another bit of concretude:
>
> Case 1: you have a representative collection of HTML pages, and you
> use them to regress data extraction. Tika has avoided this by
> depending on a non-ASF component (boilerpipe).
>
> Case 2: you have, oh, 250,000 words of news, and you get people to
> annotate them, and use them to train models. Whether there's enough of
> the right stuff out there under CC is an open question.
>
>>
>> /Larry
>>
>>
>>
>>
>>> -----Original Message-----
>>> From: Benson Margulies [mailto:bimargulies@gmail.com]
>>> Sent: Thursday, November 04, 2010 2:56 PM
>>> To: legal-discuss@apache.org
>>> Subject: Re: Fair-use data in svn
>>>
>>> > There is no exception in copyright infringement law that allows you
>>> to copy other people's copyrighted materials and distribute them on an
>>> Apache website, no matter how upstanding the goals, without a license.
>>> Ask permission first.
>>>
>>> It won't be on an apache web site. It will be in a zip file in svn,
>>> read by (for example) a unit test. That seems a relevant distinction
>>> to me, but YAAL, not me.
>>>
>>> >
>>> > If you intend to rely on a fair use defense, don't count on it
>>> without analyzing the fair use factors carefully. I'll work with you on
>>> that analysis if you can't find a better alternative for generating
>>> test data.
>>> >
>>> > If these really are "miscellaneous" web pages, why can't you create a
>>> test consisting of links to the actual pages? Must you copy the pages
>>> themselves?
>>>
>>> You can't make a repeatable process that depends on ephemeral content
>>> -- and this content is always ephemeral -- sitting there when you want
>>> it.
>>>
>>>
>>> > /Larry
>>> >
>>> >
>>> >> -----Original Message-----
>>> >> From: Benson Margulies [mailto:bimargulies@gmail.com]
>>> >> Sent: Thursday, November 04, 2010 9:07 AM
>>> >> To: legal-discuss@apache.org
>>> >> Subject: Fair-use data in svn
>>> >>
>>> >> I write code in some areas where 'real world' textual data is fuel.
>>> >> It's test cases. It's training corpora. It cannot be replaced by
>>> >> constructed, test-tube, text that could be created under the AL or
>>> >> some other 'class A' license.
>>> >>
>>> >> I'd like to contribute some of that data here at ASF. In some cases,
>>> >> that would require checking in test case data that consists of (for
>>> >> example) miscellaneous web pages grabbed with wget. In other cases,
>>> it
>>> >> might consist of larger collections of text derived from such pages.
>>> >>
>>> >> I would like to discover that this is acceptable, perhaps with some
>>> >> caveats and requirements for NOTICE.
>>> >>
>>> >> --------------------------------------------------------------------
>>> -
>>> >> To unsubscribe, e-mail: legal-discuss-unsubscribe@apache.org
>>> >> For additional commands, e-mail: legal-discuss-help@apache.org
>>> >
>>> >
>>> >
>>> > ---------------------------------------------------------------------
>>> > To unsubscribe, e-mail: legal-discuss-unsubscribe@apache.org
>>> > For additional commands, e-mail: legal-discuss-help@apache.org
>>> >
>>> >
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: legal-discuss-unsubscribe@apache.org
>>> For additional commands, e-mail: legal-discuss-help@apache.org
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: legal-discuss-unsubscribe@apache.org
>> For additional commands, e-mail: legal-discuss-help@apache.org
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: legal-discuss-unsubscribe@apache.org
> For additional commands, e-mail: legal-discuss-help@apache.org
>

Mime
View raw message