lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Fergus McMenemie <fer...@twig.me.uk>
Subject Re: Large Data Set Suggestions
Date Wed, 05 Nov 2008 16:21:45 GMT
>Greetings!
> 
>I've been asked to do some indexing performance testing on Solr 1.3
>using large XML document data sets (10M-60M docs) with DIH versus SolrJ.
>
> 
>Does anyone have any suggestions where I might find a good data set this
>size?  
> 
>I saw the wikipedia dump reference in the DIH wiki, but that is only in
>the 7M+ doc range.
> 
>Any suggestions would be greatly appreciated.
> 
>Thanks,
> 
>Steve

How large should each document be?

I quite often do testing using the geonames_dd_dms_date_20081028
dataset from http://earth-info.nga.mil/gns/html/namefiles.htm. It has
6.6M Documents. It is actually a CVS separated file but it is trivial
to convert to XML.


-- 

===============================================================
Fergus McMenemie               Email:fergus@twig.me.uk
Techmore Ltd                   Phone:(UK) 07721 376021

Unix/Mac/Intranets             Analyst Programmer
===============================================================

Mime
View raw message