hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris Folsom" <jcfol...@pureperfect.com>
Subject RE: Is it safe to have static methods in Hadoop Framework
Date Thu, 25 Jul 2013 19:15:22 GMT

The keyword "static" in java means that a single instance of it will
exist for a given class loader. Two different class loaders will have
different values for a static variable even within the same JVM running
on the same host. 

Synchronization in Java works based on locks. In the case of
synchronized keyword applied to static methods, the lock would be the
class. Same rules apply across multiple class loaders as above.

The only time you would need to synchronize something is if it contains
shared state and it must be updated in an atomic manner. This isn't
going to work in any parallel process unless you first have a shared
data structure. Static only guarantees that it will be shared within the
same class loader (again see above).

A static method is fine if there is no shared state (i.e. if it's just a
function that takes parameters and returns a value). If you need to
share state, I would look at writing to HDFS or using an ACID compliant
data store with transaction semantics (e.g. a relational database).

You might want to check out this:


I would try to avoid shared state unless it's absolutely necessary.

-------- Original Message --------
Subject: Is it safe to have static methods in Hadoop Framework
From: Huy Pham <phamvh@yahoo-inc.com>
Date: Thu, July 25, 2013 2:46 pm
To: "user@hadoop.apache.org" <user@hadoop.apache.org>, 
"user@pig.apache.org" <user@pig.apache.org>

 Hi All,
   I am writing a class (called Parser) with a couple of static
functions because I don't want millions of instances of this class to be
created during the run.
   However, I realized that Hadoop will eventually produce parallel
jobs, and if all jobs will call static functions of this Parser class,
would that be safe? 
   In other words, will all hadoop jobs share the same class Parser or
will each of them have their own Parser? In the former case, if all jobs
share the same class, then if I make the methods synchronized, then the
jobs would need to wait until the locks to the functions are released,
thus that would affect the performance. However, in later case, that
would not cause any problem.
Can someone provide some insights?

View raw message