lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "sangraal aiken" <sangr...@gmail.com>
Subject Re: Doc add limit
Date Mon, 31 Jul 2006 20:23:35 GMT
Very interesting... thanks Thom. I haven't given HttpClient a shot yet, but
will be soon.

-S

On 7/31/06, Thom Nelson <thomnelson@gmail.com> wrote:
>
> I had a similar problem and was able to fix it in Solr by manually
> buffering the responses to a StringWriter before sending it to Tomcat.
> Essentially, Tomcat's buffer will only hold so much and at that point
> it blocks (thus it always hangs at a constant number of documents).
> However, a better solution (to be implemented) is to use more
> intelligent code on the client to read the response at the same time
> that it is sending input -- not too difficult to do, though best to do
> with two threads (i.e. fire off a thread to read the response before
> you send any data).  Seeing as the HttpClient code probably does this
> already, I'll most likely end up using that.
>
> On 7/31/06, sangraal aiken <sangraal@gmail.com> wrote:
> > Those are some great ideas Chris... I'm going to try some of them
> out.  I'll
> > post the results when I get a chance to do more testing. Thanks.
> >
> > At this point I can work around the problem by ignoring Solr's response
> but
> > this is obviously not ideal. I would feel better knowing what is causing
> the
> > issue as well.
> >
> > -Sangraal
> >
> >
> >
> > On 7/29/06, Chris Hostetter <hossman_lucene@fucit.org> wrote:
> > >
> > >
> > > : Sure, the method that does all the work updating Solr is the
> > > doUpdate(String
> > > : s) method in the GanjaUpdate class I'm pasting below. It's hanging
> when
> > > I
> > > : try to read the response... the last output I receive in my log is
> Got
> > > : Reader...
> > >
> > > I don't have the means to try out this code right now ... but i can't
> see
> > > any obvious problems with it (there may be somewhere that you are
> opening
> > > a stream or reader and not closing it, but i didn't see one) ... i
> notice
> > > you are running this client on the same machine as Solr (hence the
> > > localhost URLs) did you by any chance try running the client on a
> seperate
> > > machine to see if hte number of updates before it hangs changes?
> > >
> > > my money is still on a filehandle resource limit somwhere ... if you
> are
> > > running on a system that has "lsof" (on some Unix/Linux installations
> you
> > > need sudo/su root permissions to run it) you can use "lsof -p ####" to
> > > look up what files/network connections are open for a given
> process.  You
> > > can try running that on both the client pid and the Solr server pid
> once
> > > it's hung -- You'll probably see a lot of Jar files in use for both,
> but
> > > if you see more then a few XML files open by the client, or more then
> a
> > > 1 TCP connection open by either the client or the server, there's your
> > > culprit.
> > >
> > > I'm not sure what Windows equivilent of lsof may exist.
> > >
> > > Wait ... i just had another thought....
> > >
> > > You are using InputStreamReader to deal with the InputStreams of your
> > > remote XML files -- but you aren't specifying a charset, so it's using
> > > your system default which may be differnet from the charset of the
> > > orriginal XML files you are pulling from the URL -- which (i *think*)
> > > means that your InputStreamReader may in some cases fail to read all
> of
> > > the bytes of the stream, which might some dangling filehandles (i'm
> just
> > > guessing on that part ... i'm not acctually sure whta happens in that
> > > case).
> > >
> > > What if you simplify your code (for the purposes of testing) and just
> put
> > > the post-transform version ganja-full.xml in a big ass String variable
> in
> > > your java app and just call GanjaUpdate.doUpdate(bigAssString) over
> and
> > > over again ... does that cause the same problem?
> > >
> > >
> > > :
> > > : ----------
> > > :
> > > : package com.iceninetech.solr.update;
> > > :
> > > : import com.iceninetech.xml.XMLTransformer;
> > > :
> > > : import java.io.*;
> > > : import java.net.HttpURLConnection;
> > > : import java.net.URL;
> > > : import java.util.logging.Logger;
> > > :
> > > : public class GanjaUpdate {
> > > :
> > > :   private String updateSite = "";
> > > :   private String XSL_URL = "http://localhost:8080/xsl/ganja.xsl";
> > > :
> > > :   private static final File xmlStorageDir = new
> > > : File("/source/solr/xml-dls/");
> > > :
> > > :   final Logger log = Logger.getLogger(GanjaUpdate.class.getName());
> > > :
> > > :   public GanjaUpdate(String siteName) {
> > > :     this.updateSite = siteName;
> > > :     log.info("GanjaUpdate is primed and ready to update " +
> siteName);
> > > :   }
> > > :
> > > :   public void update() {
> > > :     StringWriter sw = new StringWriter();
> > > :
> > > :     try {
> > > :       // transform gawkerInput XML to SOLR update XML
> > > :       XMLTransformer transform = new XMLTransformer();
> > > :       log.info("About to transform ganjaInput XML to Solr Update
> XML");
> > > :       transform.transform(getXML(), sw, getXSL());
> > > :       log.info("Completed ganjaInput/SolrUpdate XML transform");
> > > :
> > > :       // Write transformed XML to Disk.
> > > :       File transformedXML = new File(xmlStorageDir,
> updateSite+".sml");
> > > :       FileWriter fw = new FileWriter(transformedXML);
> > > :       fw.write(sw.toString());
> > > :       fw.close();
> > > :
> > > :       // post to Solr
> > > :       log.info("About to update Solr for site " + updateSite);
> > > :       String result = this.doUpdate(sw.toString());
> > > :       log.info("Solr says: " + result);
> > > :       sw.close();
> > > :     } catch (Exception e) {
> > > :       e.printStackTrace();
> > > :     }
> > > :   }
> > > :
> > > :   public File getXML() {
> > > :     String XML_URL = "http://localhost:8080/" + updateSite +
> "/ganja-
> > > : full.xml";
> > > :
> > > :     // check for file
> > > :     File localXML = new File(xmlStorageDir, updateSite + ".xml");
> > > :
> > > :     try {
> > > :       if (localXML.createNewFile() && localXML.canWrite()) {
> > > :         // open connection
> > > :         log.info("Downloading: " + XML_URL);
> > > :         URL url = new URL(XML_URL);
> > > :         HttpURLConnection conn = (HttpURLConnection)
> url.openConnection
> > > ();
> > > :         conn.setRequestMethod("GET");
> > > :
> > > :         // Read response to File
> > > :         log.info("Storing XML to File" + localXML.getCanonicalPath
> ());
> > > :         FileOutputStream fos = new FileOutputStream(new
> > > File(xmlStorageDir,
> > > : updateSite + ".xml"));
> > > :
> > > :         BufferedReader rd = new BufferedReader(new
> InputStreamReader(
> > > : conn.getInputStream()));
> > > :         String line;
> > > :         while ((line = rd.readLine()) != null) {
> > > :           line = line + '\n'; // add break after each line. It
> preserves
> > > : formatting.
> > > :           fos.write(line.getBytes("UTF8"));
> > > :         }
> > > :
> > > :         // close connections
> > > :         rd.close();
> > > :         fos.close();
> > > :         conn.disconnect();
> > > :         log.info("Got the XML... File saved.");
> > > :       }
> > > :     } catch (Exception e) {
> > > :       e.printStackTrace();
> > > :     }
> > > :
> > > :     return localXML;
> > > :   }
> > > :
> > > :   public File getXSL() {
> > > :     StringBuffer retVal = new StringBuffer();
> > > :
> > > :     // check for file
> > > :     File localXSL = new File(xmlStorageDir, "ganja.xsl");
> > > :
> > > :     try {
> > > :       if (localXSL.createNewFile() && localXSL.canWrite()) {
> > > :         // open connection
> > > :         log.info("Downloading: " + XSL_URL);
> > > :         URL url = new URL(XSL_URL);
> > > :         HttpURLConnection conn = (HttpURLConnection)
> url.openConnection
> > > ();
> > > :         conn.setRequestMethod("GET");
> > > :         // Read response
> > > :         BufferedReader rd = new BufferedReader(new
> InputStreamReader(
> > > : conn.getInputStream()));
> > > :         String line;
> > > :         while ((line = rd.readLine()) != null) {
> > > :           line = line + '\n';
> > > :           retVal.append(line);
> > > :         }
> > > :         // close connections
> > > :         rd.close();
> > > :         conn.disconnect();
> > > :
> > > :         log.info("Got the XSLT.");
> > > :
> > > :         // output file
> > > :         log.info("Storing XSL to File" + localXSL.getCanonicalPath
> ());
> > > :         FileOutputStream fos = new FileOutputStream(new
> > > File(xmlStorageDir,
> > > : "ganja.xsl"));
> > > :         fos.write(retVal.toString().getBytes());
> > > :         fos.close();
> > > :         log.info("File saved.");
> > > :       }
> > > :     } catch (Exception e) {
> > > :       e.printStackTrace();
> > > :     }
> > > :     return localXSL;
> > > :   }
> > > :
> > > :   private String doUpdate(String sw) {
> > > :     StringBuffer updateResult = new StringBuffer();
> > > :     try {
> > > :       // open connection
> > > :       log.info("Connecting to and preparing to post to SolrUpdate
> > > : servlet.");
> > > :       URL url = new URL("http://localhost:8080/update");
> > > :       HttpURLConnection conn = (HttpURLConnection)
> url.openConnection();
> > > :       conn.setRequestMethod("POST");
> > > :       conn.setRequestProperty("Content-Type",
> > > "application/octet-stream");
> > > :       conn.setDoOutput(true);
> > > :       conn.setDoInput(true);
> > > :       conn.setUseCaches(false);
> > > :
> > > :       // Write to server
> > > :       log.info("About to post to SolrUpdate servlet.");
> > > :       DataOutputStream output = new DataOutputStream(
> > > conn.getOutputStream
> > > : ());
> > > :       output.writeBytes(sw);
> > > :       output.flush();
> > > :       output.close();
> > > :       log.info("Finished posting to SolrUpdate servlet.");
> > > :
> > > :       // Read response
> > > :       log.info("Ready to read response.");
> > > :       BufferedReader rd = new BufferedReader(new InputStreamReader(
> > > : conn.getInputStream()));
> > > :       log.info("Got reader....");
> > > :       String line;
> > > :       while ((line = rd.readLine()) != null) {
> > > :         log.info("Writing to result...");
> > > :         updateResult.append(line);
> > > :       }
> > > :       rd.close();
> > > :
> > > :       // close connections
> > > :       conn.disconnect();
> > > :
> > > :       log.info("Done updating Solr for site" + updateSite);
> > > :     } catch (Exception e) {
> > > :       e.printStackTrace();
> > > :     }
> > > :
> > > :     return updateResult.toString();
> > > :   }
> > > : }
> > > :
> > > :
> > > : On 7/28/06, Chris Hostetter <hossman_lucene@fucit.org> wrote:
> > > : >
> > > : >
> > > : > : I'm sure... it seems like solr is having trouble writing to a
> tomcat
> > > : > : response that's been inactive for a bit. It's only 30 seconds
> > > though, so
> > > : > I'm
> > > : > : not entirely sure why that would happen.
> > > : >
> > > : > but didn't you say you don't have this problem when you use curl
> --
> > > just
> > > : > your java client code?
> > > : >
> > > : > Did you try Yonik's python test client? or the java client in
> Jira?
> > > : >
> > > : > looking over the java clinet codey you sent, it's not clear if you
> are
> > > : > reading the response back, or closing the connections ... can you
> post
> > > a
> > > : > more complete sample app thatexhibits the problem for you?
> > > : >
> > > : >
> > > : >
> > > : > -Hoss
> > > : >
> > > : >
> > > :
> > >
> > >
> > >
> > > -Hoss
> > >
> > >
> >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message