www-legal-discuss mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Benson Margulies <bimargul...@gmail.com>
Subject Re: Fair-use data in svn
Date Fri, 05 Nov 2010 12:25:30 GMT
Let me be clear on the regime that this discussion is heading for.

You can collect up a corpus of unencumbered items. You can annotate
them. You can train a model and measure the success of your algorithm.
All good.

What you cannot do is build a model that is any use on real world
data. What the world wants are classifiers (e.g.) that work on actual
CNN news feeds. Training on 'gutenberg' or CC materials won't produce
that. The data is often the hardest part of the problem, far harder
and most costly than the code. Just publishing code that can be used
to train such a thing is very convenient for very large organizations
who can join the LDC (a center at UPenn that acquires and relicenses
corpora) or, more likely, make their own.

Given that my livelihood depends on selling such things, I am perhaps
not heartbroken to discover that the ASF (at least) isn't a viable
home for free competition. On the other hand, perhaps the ASF could
effect a giant change in the landscape here by negotiating some sort
of grant from a variety of web publishers.

The legal principle at work here is very frustrating. I can collect
this stuff. I can use it. I can quietly share it with others via
private communications. But I can't check it into a public SVN, since
that looks like 'publication'. I do wonder whether simply bundling
into a .tar.gz changes anything. The traditional complaint of content
sources is against people who appropriate their content to essentially
complete with them by (it)publishing it where people can easily read
it. Do they really have a cause for complaint if the data is packaged
so that it isn't trivially readable in a web browser?

---------------------------------------------------------------------
To unsubscribe, e-mail: legal-discuss-unsubscribe@apache.org
For additional commands, e-mail: legal-discuss-help@apache.org


Mime
View raw message