From Grant Ingersoll <>
Subject MAHOUT-94: Making the Taste demo automated
Date Fri, 03 Apr 2009 12:50:40 GMT

I contacted the GroupLens project about the possibility of  
automatically downloading the datasets as part of the Taste demo.   
They have two concerns (see [1]):

1.  They currently require people to read their license before  
downloading (although I don't think they "enforce" it)
2. They are concerned about load on their servers.

So, the proposal is that we would have our build script for the demo  
present the EULA/README and require that the user answer Y to  
questions asking if they concur, after which we could automatically  
download the data and place it in the appropriate place.

As for server load, we currently have ~300 people on each or our  
mailing lists, so I can't imagine it would be huge.

So, is this even worth pursuing?  It isn't all that hard for people to  
download right now, but this would make it easier, I suppose.


Hi Grant,

The professors are back and here is the result of our internal
discussion: Using the data is OK, if we can make sure the licensing
issue and server load issue won't be in our way. For licensing, since
our license is not completely compatible with Apache, the solution we
imagine would be showing our readme and EULA for the dataset when
users try to run the demo, and only download and run when they agree.
For server load, do you have any estimate about how many people will
actually be pulling the data daily? The system staff of our department
will be pretty mad if the number goes high, so we do want to get
estimate beforehand.

