lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From James Brady <james.colin.br...@gmail.com>
Subject Commit strategies
Date Thu, 07 Feb 2008 03:42:46 GMT
Hi all,
So the Solr tutorial recommends batching operation to improve  
performance by avoiding multiple costly commits.

To implement this, I originally had a couple of methods in my python  
app reading from or writing to Solr, with a scheduled task blindly  
committing every 15 seconds.

However, my logs were chock full of errors such as:
   File "/mnt/yelteam/server_dev/YelServer/yel/yel_search.py", line  
73, in __add
     self.conn.add(**params)
   File "/mnt/yelteam/server_dev/YelServer/yel/solr.py", line 159, in  
add
     return self.doUpdateXML(xstr)
   File "/mnt/yelteam/server_dev/YelServer/yel/solr.py", line 106, in  
doUpdateXML
     rsp = self.doPost(self.solrBase+'/update', request,  
self.xmlheaders)
   File "/mnt/yelteam/server_dev/YelServer/yel/solr.py", line 94, in  
doPost
     return self.__errcheck(self.conn.getresponse())
   File "/usr/lib64/python2.4/httplib.py", line 856, in getresponse
     raise ResponseNotReady()
ResponseNotReady

and:
   File "/mnt/yelteam/server_dev/YelServer/yel/solr.py", line 159, in  
add
     return self.doUpdateXML(xstr)
   File "/mnt/yelteam/server_dev/YelServer/yel/solr.py", line 106, in  
doUpdateXML
     rsp = self.doPost(self.solrBase+'/update', request,  
self.xmlheaders)
   File "/mnt/yelteam/server_dev/YelServer/yel/solr.py", line 102, in  
doPost
     return self.__errcheck(self.conn.getresponse())
   File "/usr/lib64/python2.4/httplib.py", line 866, in getresponse
     response.begin()
   File "/usr/lib64/python2.4/httplib.py", line 336, in begin
     version, status, reason = self._read_status()
   File "/usr/lib64/python2.4/httplib.py", line 294, in _read_status
     line = self.fp.readline()
   File "/usr/lib64/python2.4/socket.py", line 317, in readline
     data = recv(1)
error: (104, 'Connection reset by peer')

and a few other variations.

I thought it might be to do with commit operations conflicting with  
reads or writes, so wrote and even dumber queueing system to hold  
onto pending reads/writes while a commit went through.

However, my logs are still full of those errors :) I doubt that  
either python's httplib library or Solr are buggy, so is it something  
to do with the way I'm using the API?

How do people generally approach the deferred commit issue? Do I need  
to queue index and search requests myself or does Solr handle it? My  
app indexes about 100 times more than it searches, but searching is  
more time critical. Does that change anything?

Thanks!
James

Mime
View raw message