lucene-java-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Lucene-java Wiki] Update of "OpenRelevance" by PeteSkomoroch
Date Fri, 24 Jul 2009 16:53:09 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Lucene-java Wiki" for change notification.

The following page has been changed by PeteSkomoroch:
http://wiki.apache.org/lucene-java/OpenRelevance

The comment on the change is:
Suggesting Amazon S3 or EBS for easy data distribution channel

------------------------------------------------------------------------------
  
  Editing of relevance judgments can be performed through a web application, so the infrastructure
needs to provide a servlet container. Search functionality will be also provided by a web
application.
  
- Distribution of the corpus is the most demanding aspect of this project. Due to its size
(~100GB) it's not practical to offer this corpus as a traditional download. ''(use P2P ? create
subsets? distribute on HDD ?)''
+ Distribution of the corpus is the most demanding aspect of this project. Due to its size
(~100GB) it's not practical to offer this corpus as a traditional download. ''(use P2P ? create
subsets? distribute on HDD ?)''.  Amazon S3 and EBS (via [http://aws.amazon.com/publicdatasets/
Amazon Public Datasets]) are efficient & cheap options for distributing larger datasets.
 Uploading to a public S3 bucket is the easiest option, and automatically [http://docs.amazonwebservices.com/AmazonS3/2006-03-01/index.html?S3Torrent.html
makes uploaded data available via torrent]. Datasets up to 1 TB [http://www.datawrangling.com/wikipedia-page-traffic-statistics-dataset
can also be distributed] via free public EBS volumes.
  
  == Queries ==
  

Mime
View raw message