lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <gsing...@apache.org>
Subject Re: [solr-user] Upgrade from 1.2 to 1.3 gives 3x slowdown
Date Thu, 02 Apr 2009 12:31:49 GMT

On Apr 2, 2009, at 4:02 AM, Fergus McMenemie wrote:
> Grant,
>
> Hmmm, the big difference is made by &overwrite=false. But,
> can you explain why &overwrite=false makes such a difference.
> I am starting off with an empty index and I have checked the
> content there are no duplicates in the uniqueKey field.
>
> I guess if &overwrite=false then a few checks can be removed
> from the indexing process, and if I am confident that my content
> contains no duplicates then this is a good speed up.
>
> http://wiki.apache.org/solr/UpdateCSV says that if overwrite
> is true (the default) then overwrite documents based on the
> uniqueKey. However what will solr/lucene do if the uniqueKey
> is not unique and overwrite=false?

overwrite=false means Solr does not issue deletes first, meaning if  
you have a doc w/ that id already, you will now have two docs with  
that id.   unique Id is enforced by Solr, not by Lucene.

Even if you can't guarantee uniqueness, you can still do overwrite =  
false as a workaround using the suggestion I gave you in a prior email:
1. Add a new field that is unique for your data source, but is the  
same for all records in that data source.  i.e. type = geonames.txt
2. Before updating, issue a delete by query for the value of that  
type, which will delete all records with that term
3. Do your indexing with overwrite = false

I should note, however, that the speed difference you are seeing may  
not be as pronounced as it appears.  If I recall during ApacheCon, I  
commented on how long it takes to shutdown your Solr instance when  
exiting it.  That time it takes is in fact Solr doing the work that  
was put off by not committing earlier and having all those deletes  
pile up.

Thus, while it is likely that your older version is still faster due  
to the new fsync stuff in Lucene, it may not be that much faster.  I  
think you could see this by actually doing commit = true, but I'm not  
100% sure.


>
>
> fergus: perl -nlaF"\t" -e 'print "$F[2]";' geonames.txt | wc -l
> 1000000
> fergus: perl -nlaF"\t" -e 'print "$F[2]";' geonames.txt | sort -u |  
> wc -l
> 1000000
> fergus: /usr/bin/head geonames.txt
> RC	UFI	UNI	LAT	LONG	DMS_LAT	DMS_LONG	MGRS	JOG	FC	DSG	PC	CC1	ADM1	 
> ADM2	POP	ELEV	CC2	NT	LC	SHORT_FORM	GENERIC	SORT_NAME	FULL_NAME	 
> FULL_NAME_ND	MODIFY_DATE
> 1	-1307828	60524	12.466667	-69.9	122800	-695400	19PDP0219578323	 
> ND19-14	T	MT		AA	00					PALUMARGA	Palu Marga	Palu Marga	1995-03-23
> 1	-1307756	-1891720	12.5	-70.016667	123000	-700100	19PCP8952982056	 
> ND19-14	P	PPLX	
>
> PS. do you want me to do some kind of chop through the
> different versions to see where the slow down happened
> or are you happy you have nailed it?	
> -- 
>
> ===============================================================
> Fergus McMenemie               Email:fergus@twig.me.uk
> Techmore Ltd                   Phone:(UK) 07721 376021
>
> Unix/Mac/Intranets             Analyst Programmer
> ===============================================================

--------------------------
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
using Solr/Lucene:
http://www.lucidimagination.com/search


Mime
View raw message