nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Fuad Efendi" <f...@efendi.ca>
Subject RE: java.net.URL synchronization
Date Wed, 09 Dec 2009 22:39:33 GMT
I checked java.net.URL; yes, Nutch and BIXO implicitly use synchronized
Hashtable:
      
  
    public URL(String protocol, String host, int port, String file,
	       URLStreamHandler handler) throws MalformedURLException {

...
	if (handler == null &&
            (handler = getURLStreamHandler(protocol)) == null) {
            throw new MalformedURLException("unknown protocol: " +
protocol);
        }

...


However, I don't think it hurts because both architecture (at least, BIXO)
run single thread in a single JVM to process, for instance, Outlinks. Only
"Fetch" part is multithreaded, but it doesn't use URL class.


Not sure about Nutch, how the fetch list is generated... if multithreaded
then "shared" between threads RegexUrlNormalizer is even bigger problem... 


Fuad Efendi
+1 416-993-2060
http://www.tokenizer.ca/
Data Mining, Vertical Search


> -----Original Message-----
> From: Otis Gospodnetic [mailto:ogjunk-nutch@yahoo.com]
> Sent: December-09-09 5:12 PM
> To: nutch-dev@lucene.apache.org
> Subject: java.net.URL synchronization
> 
> Hello,
> 
> Has anyone seen this:
> http://www.supermind.org/blog/580/java-net-url-synchronization-bottleneck
> ?
> 
> Is this something that needs to be addressed in Nutch (and thus in Bixo,
> and thus in the common crawler project)?
> 
> 
> Otis
> --
> Sematext -- http://sematext.com/ -- Solr - Lucene - Nutch




Mime
View raw message