pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Felix Gao (JIRA)" <j...@apache.org>
Subject [jira] Created: (PIG-1879) AvroStorage with wildcard causes extreme slow initialization
Date Wed, 02 Mar 2011 23:38:36 GMT
AvroStorage with wildcard causes extreme slow initialization
------------------------------------------------------------

                 Key: PIG-1879
                 URL: https://issues.apache.org/jira/browse/PIG-1879
             Project: Pig
          Issue Type: Bug
          Components: tools
    Affects Versions: 0.7.0
            Reporter: Felix Gao


I am using piggybank with avro_storage.patch from PIG-1748.  When I load the data using Load
'/user/test/logs/avro/*/*' USING org.apache.pig.piggybank.storage.avro.AvroStorage(); The
sys log on the mapper shows
2011-03-02 12:52:35,556 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: Initializing JVM Metrics
with processName=MAP, sessionId=
2011-03-02 12:52:37,333 INFO org.apache.hadoop.mapred.MapTask: io.sort.mb = 100

However, when I use load '/user/test/logs/avro/{2011-03-01/*/*-23-00*,2011-03-01/*/*2011-03-01-00-00*,2011-03-01/*/*2011-03-01-01-00*,2011-03-01/*/*2011-03-01-02-00*,2011-03-01/*/*2011-03-01-03-00*,2011-03-01/*/*2011-03-01-04-00*,2011-02-28/*/*2011-02-28-05-00*,2011-02-28/*/*2011-02-28-06-00*,2011-02-28/*/*2011-02-28-07-00*,2011-02-28/*/*2011-02-28-08-00*,2011-02-28/*/*2011-02-28-09-00*,2011-02-28/*/*2011-02-28-1*-00*,2011-02-28/*/*2011-02-28-20-00*,2011-02-28/*/*2011-02-28-21-00*,2011-02-28/*/*2011-02-28-22-00*}'
The sys log on the mapper shows
2011-03-02 12:03:33,091 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: Initializing JVM Metrics
with processName=MAP, sessionId=
2011-03-02 12:06:05,254 INFO org.apache.hadoop.mapred.MapTask: io.sort.mb = 100

Notice it took 2 minute and 30 seconds on the initialization stage.  If  I cut the number
of file patterns in the glob to half. The mappers will be twice as fast.  
 

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message