Return-Path: Delivered-To: apmail-hadoop-common-user-archive@www.apache.org Received: (qmail 67085 invoked from network); 13 Oct 2010 21:26:52 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 13 Oct 2010 21:26:52 -0000 Received: (qmail 36997 invoked by uid 500); 13 Oct 2010 21:26:50 -0000 Delivered-To: apmail-hadoop-common-user-archive@hadoop.apache.org Received: (qmail 36855 invoked by uid 500); 13 Oct 2010 21:26:50 -0000 Mailing-List: contact common-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: common-user@hadoop.apache.org Delivered-To: mailing list common-user@hadoop.apache.org Received: (qmail 36847 invoked by uid 99); 13 Oct 2010 21:26:50 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 13 Oct 2010 21:26:50 +0000 X-ASF-Spam-Status: No, hits=0.7 required=10.0 tests=RCVD_IN_DNSWL_NONE,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: local policy) Received: from [74.125.82.48] (HELO mail-ww0-f48.google.com) (74.125.82.48) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 13 Oct 2010 21:26:42 +0000 Received: by wwb18 with SMTP id 18so5440wwb.29 for ; Wed, 13 Oct 2010 14:26:21 -0700 (PDT) Received: by 10.216.175.83 with SMTP id y61mr8937028wel.30.1287005180334; Wed, 13 Oct 2010 14:26:20 -0700 (PDT) Received: from tp (c-24-4-185-157.hsd1.ca.comcast.net [24.4.185.157]) by mx.google.com with ESMTPS id l14sm1361728weq.11.2010.10.13.14.26.18 (version=SSLv3 cipher=RC4-MD5); Wed, 13 Oct 2010 14:26:19 -0700 (PDT) Date: Wed, 13 Oct 2010 14:26:14 -0700 From: Konstantin Boudnik To: common-user@hadoop.apache.org Subject: Re: load a serialized object in hadoop Message-ID: <20101013212614.GC10266@tp> References: <4CB3787B.2050403@uchicago.edu> <4CB5CA6B.7050603@uchicago.edu> <4CB5FE31.8090103@uchicago.edu> <4CB6081A.1020605@uchicago.edu> <4CB622F1.3070107@uchicago.edu> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <4CB622F1.3070107@uchicago.edu> X-Organization: It's something of 'Cos X-PGP-Key: http://www.boudnik.org/~cos/pubkey.asc User-Agent: Mutt/1.5.20 (2009-06-14) X-Virus-Checked: Checked by ClamAV on apache.org You should have no space here "-D HADOOP_CLIENT_OPTS" On Wed, Oct 13, 2010 at 04:21PM, Shi Yu wrote: > Hi, thanks for the advice. I tried with your settings, > $ bin/hadoop jar Test.jar OOloadtest -D HADOOP_CLIENT_OPTS=-Xmx4000m > > still no effect. Or this is a system variable? Should I export it? > How to configure it? > > Shi > > java -Xms3G -Xmx3G -classpath .:WordCount.jar:hadoop-0.19.2-core.jar:lib/log4j-1.2.15.jar:lib/commons-collections-3.2.1.jar:lib/stanford-postagger-2010-05-26.jar > OOloadtest > > > On 2010-10-13 15:28, Luke Lu wrote: > >On Wed, Oct 13, 2010 at 12:27 PM, Shi Yu wrote: > >>I haven't implemented anything in map/reduce yet for this issue. I just try > >>to invoke the same java class using bin/hadoop command. The thing is a > >>very simple program could be executed in Java, but not doable in bin/hadoop > >>command. > >If you are just trying to use bin/hadoop jar your.jar command, your > >code runs in a local client jvm and mapred.child.java.opts has no > >effect. You should run it with HADOOP_CLIENT_OPTS=-Xmx1000m bin/hadoop > >jar your.jar > > > >>I think if I couldn't get through the first stage, even I had a > >>map/reduce program it would also fail. I am using Hadoop 0.19.2. Thanks. > >> > >>Best Regards, > >> > >>Shi > >> > >>On 2010-10-13 14:15, Luke Lu wrote: > >>>Can you post your mapper/reducer implementation? or are you using > >>>hadoop streaming? for which mapred.child.java.opts doesn't apply to > >>>the jvm you care about. BTW, what's the hadoop version you're using? > >>> > >>>On Wed, Oct 13, 2010 at 11:45 AM, Shi Yu wrote: > >>> > >>>>Here is my code. There is no Map/Reduce in it. I could run this code > >>>>using > >>>>java -Xmx1000m , however, when using bin/hadoop -D > >>>>mapred.child.java.opts=-Xmx3000M it has heap space not enough error. I > >>>>have tried other program in Hadoop with the same settings so the memory > >>>>is > >>>>available in my machines. > >>>> > >>>> > >>>>public static void main(String[] args) { > >>>> try{ > >>>> String myFile = "xxx.dat"; > >>>> FileInputStream fin = new FileInputStream(myFile); > >>>> ois = new ObjectInputStream(fin); > >>>> margintagMap = ois.readObject(); > >>>> ois.close(); > >>>> fin.close(); > >>>> }catch(Exception e){ > >>>> // > >>>> } > >>>>} > >>>> > >>>>On 2010-10-13 13:30, Luke Lu wrote: > >>>> > >>>>>On Wed, Oct 13, 2010 at 8:04 AM, Shi Yu wrote: > >>>>> > >>>>> > >>>>>>As a coming-up to the my own question, I think to invoke the JVM in > >>>>>>Hadoop > >>>>>>requires much more memory than an ordinary JVM. > >>>>>> > >>>>>> > >>>>>That's simply not true. The default mapreduce task Xmx is 200M, which > >>>>>is much smaller than the standard jvm default 512M and most users > >>>>>don't need to increase it. Please post the code reading the object (in > >>>>>hdfs?) in your tasks. > >>>>> > >>>>> > >>>>> > >>>>>>I found that instead of > >>>>>>serialization the object, maybe I could create a MapFile as an index to > >>>>>>permit lookups by key in Hadoop. I have also compared the performance > >>>>>>of > >>>>>>MongoDB and Memcache. I will let you know the result after I try the > >>>>>>MapFile > >>>>>>approach. > >>>>>> > >>>>>>Shi > >>>>>> > >>>>>>On 2010-10-12 21:59, M. C. Srivas wrote: > >>>>>> > >>>>>> > >>>>>>>>On Tue, Oct 12, 2010 at 4:50 AM, Shi Yu > >>>>>>>> wrote: > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>>>Hi, > >>>>>>>>> > >>>>>>>>>I want to load a serialized HashMap object in hadoop. The file of > >>>>>>>>>stored > >>>>>>>>>object is 200M. I could read that object efficiently in JAVA by > >>>>>>>>>setting > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>-Xmx > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>>>as 1000M. However, in hadoop I could never load it into memory. The > >>>>>>>>>code > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>is > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>>>very simple (just read the ObjectInputStream) and there is yet no > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>map/reduce > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>>>implemented. I set the mapred.child.java.opts=-Xmx3000M, still get > >>>>>>>>>the > >>>>>>>>>"java.lang.OutOfMemoryError: Java heap space" Could anyone explain > >>>>>>>>>a > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>little > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>>>bit how memory is allocate to JVM in hadoop. Why hadoop takes up so > >>>>>>>>>much > >>>>>>>>>memory? If a program requires 1G memory on a single node, how much > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>memory > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>>>it requires (generally) in Hadoop? > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>> > >>>>>>>The JVM reserves swap space in advance, at the time of launching the > >>>>>>>process. If your swap is too low (or do not have any swap configured), > >>>>>>>you > >>>>>>>will hit this. > >>>>>>> > >>>>>>>Or, you are on a 32-bit machine, in which case 3G is not possible in > >>>>>>>the > >>>>>>>JVM. > >>>>>>> > >>>>>>>-Srivas. > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>>>>Thanks. > >>>>>>>>> > >>>>>>>>>Shi > >>>>>>>>> > >>>>>>>>>-- > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>> > >>>>>>> > >>>>>> > >>>>>> > >>>>-- > >>>>Postdoctoral Scholar > >>>>Institute for Genomics and Systems Biology > >>>>Department of Medicine, the University of Chicago > >>>>Knapp Center for Biomedical Discovery > >>>>900 E. 57th St. Room 10148 > >>>>Chicago, IL 60637, US > >>>>Tel: 773-702-6799 > >>>> > >>>> > >>>> > >> > >>-- > >>Postdoctoral Scholar > >>Institute for Genomics and Systems Biology > >>Department of Medicine, the University of Chicago > >>Knapp Center for Biomedical Discovery > >>900 E. 57th St. Room 10148 > >>Chicago, IL 60637, US > >>Tel: 773-702-6799 > >> > >> > >