Return-Path: Delivered-To: apmail-lucene-solr-user-archive@locus.apache.org Received: (qmail 23237 invoked from network); 31 Jul 2006 20:23:58 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur.apache.org with SMTP; 31 Jul 2006 20:23:58 -0000 Received: (qmail 20080 invoked by uid 500); 31 Jul 2006 20:23:57 -0000 Delivered-To: apmail-lucene-solr-user-archive@lucene.apache.org Received: (qmail 20060 invoked by uid 500); 31 Jul 2006 20:23:57 -0000 Mailing-List: contact solr-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: solr-user@lucene.apache.org Delivered-To: mailing list solr-user@lucene.apache.org Received: (qmail 20051 invoked by uid 99); 31 Jul 2006 20:23:57 -0000 Received: from asf.osuosl.org (HELO asf.osuosl.org) (140.211.166.49) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 31 Jul 2006 13:23:57 -0700 X-ASF-Spam-Status: No, hits=1.3 required=10.0 tests=DNS_FROM_RFC_ABUSE,HTML_MESSAGE,INFO_TLD,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (asf.osuosl.org: domain of sangraal@gmail.com designates 64.233.166.181 as permitted sender) Received: from [64.233.166.181] (HELO py-out-1112.google.com) (64.233.166.181) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 31 Jul 2006 13:23:56 -0700 Received: by py-out-1112.google.com with SMTP id d80so717298pyd for ; Mon, 31 Jul 2006 13:23:35 -0700 (PDT) DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com; h=received:message-id:date:from:to:subject:in-reply-to:mime-version:content-type:references; b=epNpirYRm6dDLO64jbuaKaRg4jghMkZgfe937Mz/yCsDD5LSPjjgoy/apG3KHVj6wRF2ivmQaAAwwRbmiAggshUk/ViJYolm2idLrJMcoZg9r5oZv1jvBOeGyrBP+ID+Y/o1M1GoomcgVQIZPxowwqOqxUXIF2jjwQXOgvwvX+I= Received: by 10.35.29.6 with SMTP id g6mr4383758pyj; Mon, 31 Jul 2006 13:23:35 -0700 (PDT) Received: by 10.35.34.8 with HTTP; Mon, 31 Jul 2006 13:23:35 -0700 (PDT) Message-ID: <5a45c8520607311323u15104074p9853aeeb7bfff075@mail.gmail.com> Date: Mon, 31 Jul 2006 16:23:35 -0400 From: "sangraal aiken" To: solr-user@lucene.apache.org Subject: Re: Doc add limit In-Reply-To: MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_Part_50326_21749897.1154377415289" References: <3d2ce8cb0607271533y32d731cbna1c6c4fd54b372f6@mail.gmail.com> <5a45c8520607271806q1d001974g86ef86f31c6d7ec5@mail.gmail.com> <5a45c8520607271821s1f9028ccy65220724c3a0590e@mail.gmail.com> <5a45c8520607271838l46ca55l172866ddedcad7bc@mail.gmail.com> <5a45c8520607280701j6f40314fq462e59c3fd2e3d42@mail.gmail.com> <5a45c8520607310832x1f341dfbvca6e2379c34107f8@mail.gmail.com> X-Virus-Checked: Checked by ClamAV on apache.org X-Spam-Rating: minotaur.apache.org 1.6.2 0/1000/N ------=_Part_50326_21749897.1154377415289 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Content-Disposition: inline Very interesting... thanks Thom. I haven't given HttpClient a shot yet, but will be soon. -S On 7/31/06, Thom Nelson wrote: > > I had a similar problem and was able to fix it in Solr by manually > buffering the responses to a StringWriter before sending it to Tomcat. > Essentially, Tomcat's buffer will only hold so much and at that point > it blocks (thus it always hangs at a constant number of documents). > However, a better solution (to be implemented) is to use more > intelligent code on the client to read the response at the same time > that it is sending input -- not too difficult to do, though best to do > with two threads (i.e. fire off a thread to read the response before > you send any data). Seeing as the HttpClient code probably does this > already, I'll most likely end up using that. > > On 7/31/06, sangraal aiken wrote: > > Those are some great ideas Chris... I'm going to try some of them > out. I'll > > post the results when I get a chance to do more testing. Thanks. > > > > At this point I can work around the problem by ignoring Solr's response > but > > this is obviously not ideal. I would feel better knowing what is causing > the > > issue as well. > > > > -Sangraal > > > > > > > > On 7/29/06, Chris Hostetter wrote: > > > > > > > > > : Sure, the method that does all the work updating Solr is the > > > doUpdate(String > > > : s) method in the GanjaUpdate class I'm pasting below. It's hanging > when > > > I > > > : try to read the response... the last output I receive in my log is > Got > > > : Reader... > > > > > > I don't have the means to try out this code right now ... but i can't > see > > > any obvious problems with it (there may be somewhere that you are > opening > > > a stream or reader and not closing it, but i didn't see one) ... i > notice > > > you are running this client on the same machine as Solr (hence the > > > localhost URLs) did you by any chance try running the client on a > seperate > > > machine to see if hte number of updates before it hangs changes? > > > > > > my money is still on a filehandle resource limit somwhere ... if you > are > > > running on a system that has "lsof" (on some Unix/Linux installations > you > > > need sudo/su root permissions to run it) you can use "lsof -p ####" to > > > look up what files/network connections are open for a given > process. You > > > can try running that on both the client pid and the Solr server pid > once > > > it's hung -- You'll probably see a lot of Jar files in use for both, > but > > > if you see more then a few XML files open by the client, or more then > a > > > 1 TCP connection open by either the client or the server, there's your > > > culprit. > > > > > > I'm not sure what Windows equivilent of lsof may exist. > > > > > > Wait ... i just had another thought.... > > > > > > You are using InputStreamReader to deal with the InputStreams of your > > > remote XML files -- but you aren't specifying a charset, so it's using > > > your system default which may be differnet from the charset of the > > > orriginal XML files you are pulling from the URL -- which (i *think*) > > > means that your InputStreamReader may in some cases fail to read all > of > > > the bytes of the stream, which might some dangling filehandles (i'm > just > > > guessing on that part ... i'm not acctually sure whta happens in that > > > case). > > > > > > What if you simplify your code (for the purposes of testing) and just > put > > > the post-transform version ganja-full.xml in a big ass String variable > in > > > your java app and just call GanjaUpdate.doUpdate(bigAssString) over > and > > > over again ... does that cause the same problem? > > > > > > > > > : > > > : ---------- > > > : > > > : package com.iceninetech.solr.update; > > > : > > > : import com.iceninetech.xml.XMLTransformer; > > > : > > > : import java.io.*; > > > : import java.net.HttpURLConnection; > > > : import java.net.URL; > > > : import java.util.logging.Logger; > > > : > > > : public class GanjaUpdate { > > > : > > > : private String updateSite = ""; > > > : private String XSL_URL = "http://localhost:8080/xsl/ganja.xsl"; > > > : > > > : private static final File xmlStorageDir = new > > > : File("/source/solr/xml-dls/"); > > > : > > > : final Logger log = Logger.getLogger(GanjaUpdate.class.getName()); > > > : > > > : public GanjaUpdate(String siteName) { > > > : this.updateSite = siteName; > > > : log.info("GanjaUpdate is primed and ready to update " + > siteName); > > > : } > > > : > > > : public void update() { > > > : StringWriter sw = new StringWriter(); > > > : > > > : try { > > > : // transform gawkerInput XML to SOLR update XML > > > : XMLTransformer transform = new XMLTransformer(); > > > : log.info("About to transform ganjaInput XML to Solr Update > XML"); > > > : transform.transform(getXML(), sw, getXSL()); > > > : log.info("Completed ganjaInput/SolrUpdate XML transform"); > > > : > > > : // Write transformed XML to Disk. > > > : File transformedXML = new File(xmlStorageDir, > updateSite+".sml"); > > > : FileWriter fw = new FileWriter(transformedXML); > > > : fw.write(sw.toString()); > > > : fw.close(); > > > : > > > : // post to Solr > > > : log.info("About to update Solr for site " + updateSite); > > > : String result = this.doUpdate(sw.toString()); > > > : log.info("Solr says: " + result); > > > : sw.close(); > > > : } catch (Exception e) { > > > : e.printStackTrace(); > > > : } > > > : } > > > : > > > : public File getXML() { > > > : String XML_URL = "http://localhost:8080/" + updateSite + > "/ganja- > > > : full.xml"; > > > : > > > : // check for file > > > : File localXML = new File(xmlStorageDir, updateSite + ".xml"); > > > : > > > : try { > > > : if (localXML.createNewFile() && localXML.canWrite()) { > > > : // open connection > > > : log.info("Downloading: " + XML_URL); > > > : URL url = new URL(XML_URL); > > > : HttpURLConnection conn = (HttpURLConnection) > url.openConnection > > > (); > > > : conn.setRequestMethod("GET"); > > > : > > > : // Read response to File > > > : log.info("Storing XML to File" + localXML.getCanonicalPath > ()); > > > : FileOutputStream fos = new FileOutputStream(new > > > File(xmlStorageDir, > > > : updateSite + ".xml")); > > > : > > > : BufferedReader rd = new BufferedReader(new > InputStreamReader( > > > : conn.getInputStream())); > > > : String line; > > > : while ((line = rd.readLine()) != null) { > > > : line = line + '\n'; // add break after each line. It > preserves > > > : formatting. > > > : fos.write(line.getBytes("UTF8")); > > > : } > > > : > > > : // close connections > > > : rd.close(); > > > : fos.close(); > > > : conn.disconnect(); > > > : log.info("Got the XML... File saved."); > > > : } > > > : } catch (Exception e) { > > > : e.printStackTrace(); > > > : } > > > : > > > : return localXML; > > > : } > > > : > > > : public File getXSL() { > > > : StringBuffer retVal = new StringBuffer(); > > > : > > > : // check for file > > > : File localXSL = new File(xmlStorageDir, "ganja.xsl"); > > > : > > > : try { > > > : if (localXSL.createNewFile() && localXSL.canWrite()) { > > > : // open connection > > > : log.info("Downloading: " + XSL_URL); > > > : URL url = new URL(XSL_URL); > > > : HttpURLConnection conn = (HttpURLConnection) > url.openConnection > > > (); > > > : conn.setRequestMethod("GET"); > > > : // Read response > > > : BufferedReader rd = new BufferedReader(new > InputStreamReader( > > > : conn.getInputStream())); > > > : String line; > > > : while ((line = rd.readLine()) != null) { > > > : line = line + '\n'; > > > : retVal.append(line); > > > : } > > > : // close connections > > > : rd.close(); > > > : conn.disconnect(); > > > : > > > : log.info("Got the XSLT."); > > > : > > > : // output file > > > : log.info("Storing XSL to File" + localXSL.getCanonicalPath > ()); > > > : FileOutputStream fos = new FileOutputStream(new > > > File(xmlStorageDir, > > > : "ganja.xsl")); > > > : fos.write(retVal.toString().getBytes()); > > > : fos.close(); > > > : log.info("File saved."); > > > : } > > > : } catch (Exception e) { > > > : e.printStackTrace(); > > > : } > > > : return localXSL; > > > : } > > > : > > > : private String doUpdate(String sw) { > > > : StringBuffer updateResult = new StringBuffer(); > > > : try { > > > : // open connection > > > : log.info("Connecting to and preparing to post to SolrUpdate > > > : servlet."); > > > : URL url = new URL("http://localhost:8080/update"); > > > : HttpURLConnection conn = (HttpURLConnection) > url.openConnection(); > > > : conn.setRequestMethod("POST"); > > > : conn.setRequestProperty("Content-Type", > > > "application/octet-stream"); > > > : conn.setDoOutput(true); > > > : conn.setDoInput(true); > > > : conn.setUseCaches(false); > > > : > > > : // Write to server > > > : log.info("About to post to SolrUpdate servlet."); > > > : DataOutputStream output = new DataOutputStream( > > > conn.getOutputStream > > > : ()); > > > : output.writeBytes(sw); > > > : output.flush(); > > > : output.close(); > > > : log.info("Finished posting to SolrUpdate servlet."); > > > : > > > : // Read response > > > : log.info("Ready to read response."); > > > : BufferedReader rd = new BufferedReader(new InputStreamReader( > > > : conn.getInputStream())); > > > : log.info("Got reader...."); > > > : String line; > > > : while ((line = rd.readLine()) != null) { > > > : log.info("Writing to result..."); > > > : updateResult.append(line); > > > : } > > > : rd.close(); > > > : > > > : // close connections > > > : conn.disconnect(); > > > : > > > : log.info("Done updating Solr for site" + updateSite); > > > : } catch (Exception e) { > > > : e.printStackTrace(); > > > : } > > > : > > > : return updateResult.toString(); > > > : } > > > : } > > > : > > > : > > > : On 7/28/06, Chris Hostetter wrote: > > > : > > > > : > > > > : > : I'm sure... it seems like solr is having trouble writing to a > tomcat > > > : > : response that's been inactive for a bit. It's only 30 seconds > > > though, so > > > : > I'm > > > : > : not entirely sure why that would happen. > > > : > > > > : > but didn't you say you don't have this problem when you use curl > -- > > > just > > > : > your java client code? > > > : > > > > : > Did you try Yonik's python test client? or the java client in > Jira? > > > : > > > > : > looking over the java clinet codey you sent, it's not clear if you > are > > > : > reading the response back, or closing the connections ... can you > post > > > a > > > : > more complete sample app thatexhibits the problem for you? > > > : > > > > : > > > > : > > > > : > -Hoss > > > : > > > > : > > > > : > > > > > > > > > > > > -Hoss > > > > > > > > > > > ------=_Part_50326_21749897.1154377415289--