lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From souravm <SOUR...@infosys.com>
Subject Re: Large Data Set Suggestions
Date Wed, 05 Nov 2008 23:15:17 GMT
Hi Fergus,

Does the 6.6m doc resides on a single box (node) or multiple boxes ? Do u use distributed
search ?

Regards,
Sourav

----- Original Message -----
From: Fergus McMenemie <fergus@twig.me.uk>
To: solr-user@lucene.apache.org <solr-user@lucene.apache.org>
Sent: Wed Nov 05 08:21:45 2008
Subject: Re: Large Data Set Suggestions

>Greetings!
>
>I've been asked to do some indexing performance testing on Solr 1.3
>using large XML document data sets (10M-60M docs) with DIH versus SolrJ.
>
>
>Does anyone have any suggestions where I might find a good data set this
>size?
>
>I saw the wikipedia dump reference in the DIH wiki, but that is only in
>the 7M+ doc range.
>
>Any suggestions would be greatly appreciated.
>
>Thanks,
>
>Steve

How large should each document be?

I quite often do testing using the geonames_dd_dms_date_20081028
dataset from http://earth-info.nga.mil/gns/html/namefiles.htm. It has
6.6M Documents. It is actually a CVS separated file but it is trivial
to convert to XML.


--

===============================================================
Fergus McMenemie               Email:fergus@twig.me.uk
Techmore Ltd                   Phone:(UK) 07721 376021

Unix/Mac/Intranets             Analyst Programmer
===============================================================

**************** CAUTION - Disclaimer *****************
This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION intended solely 
for the use of the addressee(s). If you are not the intended recipient, please 
notify the sender by e-mail and delete the original message. Further, you are not 
to copy, disclose, or distribute this e-mail or its contents to any other person and 
any such actions are unlawful. This e-mail may contain viruses. Infosys has taken 
every reasonable precaution to minimize this risk, but is not liable for any damage 
you may sustain as a result of any virus in this e-mail. You should carry out your 
own virus checks before opening the e-mail or attachment. Infosys reserves the 
right to monitor and review the content of all messages sent to or from this e-mail 
address. Messages sent to or from this e-mail address may be stored on the 
Infosys e-mail system.
***INFOSYS******** End of Disclaimer ********INFOSYS***
Mime
View raw message