Return-Path: Delivered-To: apmail-lucene-mahout-dev-archive@minotaur.apache.org Received: (qmail 59946 invoked from network); 3 Apr 2009 12:51:12 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 3 Apr 2009 12:51:12 -0000 Received: (qmail 73646 invoked by uid 500); 3 Apr 2009 12:51:12 -0000 Delivered-To: apmail-lucene-mahout-dev-archive@lucene.apache.org Received: (qmail 73572 invoked by uid 500); 3 Apr 2009 12:51:11 -0000 Mailing-List: contact mahout-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: mahout-dev@lucene.apache.org Delivered-To: mailing list mahout-dev@lucene.apache.org Received: (qmail 73562 invoked by uid 99); 3 Apr 2009 12:51:11 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 03 Apr 2009 12:51:11 +0000 X-ASF-Spam-Status: No, hits=1.2 required=10.0 tests=SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [208.97.132.74] (HELO spunkymail-a12.g.dreamhost.com) (208.97.132.74) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 03 Apr 2009 12:51:03 +0000 Received: from [192.168.0.103] (adsl-074-229-189-244.sip.rmo.bellsouth.net [74.229.189.244]) by spunkymail-a12.g.dreamhost.com (Postfix) with ESMTP id B95207FA8 for ; Fri, 3 Apr 2009 05:50:41 -0700 (PDT) Message-Id: <40140517-6A13-491B-AFF1-9516C20CCCCA@apache.org> From: Grant Ingersoll To: mahout-dev@lucene.apache.org Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes Content-Transfer-Encoding: 7bit Mime-Version: 1.0 (Apple Message framework v930.3) Subject: MAHOUT-94: Making the Taste demo automated Date: Fri, 3 Apr 2009 08:50:40 -0400 X-Mailer: Apple Mail (2.930.3) X-Virus-Checked: Checked by ClamAV on apache.org https://issues.apache.org/jira/browse/MAHOUT-94 I contacted the GroupLens project about the possibility of automatically downloading the datasets as part of the Taste demo. They have two concerns (see [1]): 1. They currently require people to read their license before downloading (although I don't think they "enforce" it) 2. They are concerned about load on their servers. So, the proposal is that we would have our build script for the demo present the EULA/README and require that the user answer Y to questions asking if they concur, after which we could automatically download the data and place it in the appropriate place. As for server load, we currently have ~300 people on each or our mailing lists, so I can't imagine it would be huge. So, is this even worth pursuing? It isn't all that hard for people to download right now, but this would make it easier, I suppose. -Grant [1] Hi Grant, The professors are back and here is the result of our internal discussion: Using the data is OK, if we can make sure the licensing issue and server load issue won't be in our way. For licensing, since our license is not completely compatible with Apache, the solution we imagine would be showing our readme and EULA for the dataset when users try to run the demo, and only download and run when they agree. For server load, do you have any estimate about how many people will actually be pulling the data daily? The system staff of our department will be pretty mad if the number goes high, so we do want to get estimate beforehand. -------------------------- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene: http://www.lucidimagination.com/search