www-legal-discuss mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Benson Margulies <bimargul...@gmail.com>
Subject Re: Fair-use data in svn
Date Sun, 07 Nov 2010 03:13:39 GMT

Before I type anything else, I'd better say, "Thank you, I now
appreciate that 'fair use' has nothing much to do with the practical
matter at hand."

The process of building NLP models has three parts: first, collect a
corpus. Second, annotate it. Third, build a model.

My original query here concerns the ability of the ASF to host the
first part -- in the case where the desired corpus is made up of
copyrighted materials for which no special permissions have been
obtained. What I think I've learned from this discussion is that the
usual ASF practice -- all 'source' materials are in the source tree,
available to anyone -- is essentially a publication that is likely to
infringe on copyright.

So, unless the ASF is willing to sanction an alternative process to
checking everything into the public source tree, ASF projects can't do
this entire process. Not because the models, as per your most recent
message, themselves can infringe, but because the publication of the
source materials would. I did want to double-check my belief that a
model derived from text was not, on its face, a derived work that
could infringe -- before I bothered anyone any further about this.

So, in my mind, this brings us to the question of how the ASF could
serve as a collection point for copyrighted corpora. The answer might
be, "It can't." Dan Kulp raised what to me is the obvious alternative:
some storage accessible to committers but not the general public.
Since this is the legal-discuss list, it strikes me as sensible for
this discussion to discover those strategies that are *legally*
reasonable (if any), and leave it to, well, the board, to decide if
any of those are tolerable from the standpoint of the Foundation's
goals. So, if I use a spider to grab a large amount of copyrighted
material, how narrowly do I have to control its distribution to avoid
infringement? The spamassasin example seems apposite, and I wish that
Daryl would give more details about where the ham is kept and who has
access to it, and what legal determination went into setting up the
whole business.

To unsubscribe, e-mail: legal-discuss-unsubscribe@apache.org
For additional commands, e-mail: legal-discuss-help@apache.org

View raw message