Return-Path: X-Original-To: apmail-hbase-dev-archive@www.apache.org Delivered-To: apmail-hbase-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id DA446109AD for ; Wed, 4 Sep 2013 01:31:16 +0000 (UTC) Received: (qmail 44924 invoked by uid 500); 4 Sep 2013 01:31:16 -0000 Delivered-To: apmail-hbase-dev-archive@hbase.apache.org Received: (qmail 44872 invoked by uid 500); 4 Sep 2013 01:31:16 -0000 Mailing-List: contact dev-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hbase.apache.org Delivered-To: mailing list dev@hbase.apache.org Received: (qmail 44863 invoked by uid 99); 4 Sep 2013 01:31:16 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 04 Sep 2013 01:31:16 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,NORMAL_HTTP_TO_IP,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of ddas@hortonworks.com designates 209.85.212.41 as permitted sender) Received: from [209.85.212.41] (HELO mail-vb0-f41.google.com) (209.85.212.41) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 04 Sep 2013 01:31:09 +0000 Received: by mail-vb0-f41.google.com with SMTP id g17so4696739vbg.28 for ; Tue, 03 Sep 2013 18:30:48 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:content-type; bh=Z06QT4146k97+NiSCiB+TmfJNcdDBN4beVmrohbA6Ig=; b=gb+B0fzG3dpAY+OZgN8gzHg5qc6sXRemKArYZDD/06kVw4tdPxV97/L12k8pg5stPz zT2eNZRY2YTAWK0iUXGSCm6bHbXS+3t2qjRwkJ1sJEtA4Xx3L5zqp+HIyQLnmcJjhPNl jmYiYp9epcu1pfELykPKunueD7mqBa4srY45CY1chz8UFw8AYl+kdm5ka4bz5MIK/EEN S34Md12GwXHBQ8MN6lz/Kd4pUwnl2aCEpRl4vd343sw90OjoKXr+ySY71fXVVVyWRPHb Bp1NzbPTljKswa4L8rC7rdzZLTPt4VU3i/bVlooefbzzk1CjGLjGs1R52CVgs/Iavxzl I3wg== X-Gm-Message-State: ALoCoQmHjYYPVy0owlDxScHoqGHJt+GbclxcZeywhPyWokqTvtNjdBDPl7wj8cJC+2u3TZ3uKWqDPRcAWyKqNFbhc2N7UggFHhSSxPu/0kNYo8uxSwp0Aj4= MIME-Version: 1.0 X-Received: by 10.52.75.165 with SMTP id d5mr131774vdw.1.1378258248608; Tue, 03 Sep 2013 18:30:48 -0700 (PDT) Received: by 10.59.8.67 with HTTP; Tue, 3 Sep 2013 18:30:48 -0700 (PDT) In-Reply-To: References: Date: Tue, 3 Sep 2013 18:30:48 -0700 Message-ID: Subject: Re: [VOTE] The 1st hbase-0.96.0 release candidate is available for download From: Devaraj Das To: "dev@hbase.apache.org" Content-Type: multipart/alternative; boundary=20cf3071c9fc513b3d04e584c11a X-Virus-Checked: Checked by ClamAV on apache.org --20cf3071c9fc513b3d04e584c11a Content-Type: text/plain; charset=US-ASCII Well from the test code it seems like the problem is due to the fact that the reducer got unexpected data and it was trying to construct the log message for the user. So the job had already failed in reality. On Tue, Sep 3, 2013 at 6:17 PM, Devaraj Das wrote: > Elliott, what are the heap sizes of the M/R tasks in your setup. I was > running the job like this (without chaosmonkey to start with): > > hbase org.apache.hadoop.hbase.test.IntegrationTestBigLinkedList Loop 5 12 > 2500000 IntegrationTestBigLinkedList 10 > > Even the above test failed with one reduce task failing with OOM, in the > verify step. The heap size was set to 3G. > > 2013-09-04 01:11:56,054 FATAL [main] org.apache.hadoop.mapred.YarnChild: Error running child : java.lang.OutOfMemoryError: Java heap space > at java.util.Arrays.copyOf(Arrays.java:2882) > at java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:100) > at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:390) > at java.lang.StringBuilder.append(StringBuilder.java:119) > at org.apache.hadoop.hbase.test.IntegrationTestBigLinkedList$Verify$VerifyReducer.reduce(IntegrationTestBigLinkedList.java:576) > at org.apache.hadoop.hbase.test.IntegrationTestBigLinkedList$Verify$VerifyReducer.reduce(IntegrationTestBigLinkedList.java:547) > at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:171) > at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:645) > at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:405) > at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:162) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:396) > at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1477) > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:157) > > > > > > On Tue, Sep 3, 2013 at 12:56 PM, Elliott Clark wrote: > >> Can someone take a look at running Test Big Linked for > 5 iterations >> with slowDeterministic chaos monkey on a distributed cluster. I'm >> pretty concerned about HBASE-9338 >> >> On Tue, Sep 3, 2013 at 6:57 AM, Jean-Marc Spaggiari >> wrote: >> > There was a typo in my log4j.properties :( >> > >> > So it's working fine. >> > >> > The only INFO logs I still see are those one: >> > 2013-09-03 09:53:07,313 INFO [M:0;t430s:45176] mortbay.log: Logging to >> > org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via >> > org.mortbay.log.Slf4jLog >> > 2013-09-03 09:53:07,350 INFO [M:0;t430s:45176] mortbay.log: >> jetty-6.1.26 >> > But there is only very few of them. >> > >> > Performances wise, here are the numbers (the higher, the better. Rows >> per >> > seconds, expect for scans where it's rows/min). As you will see, 0.96 is >> > slower only for RandomSeekScanTest (way slower) and >> RandomScanWithRange10 >> > but is faster for everything else. I ran the tests with the default >> > settings. So I think we should look at RandomSeekScanTest but expect >> this >> > one, everything else is pretty good. >> > >> > Also, I have been able to reproduce this exception: >> > 2013-09-03 09:55:36,718 WARN [NIOServerCxn.Factory: >> 0.0.0.0/0.0.0.0:2181] >> > server.NIOServerCnxn: caught end of stream exception >> > EndOfStreamException: Unable to read additional data from client >> sessionid >> > 0x140e4191edb0009, likely client has closed socket >> > at >> > org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:220) >> > at >> > >> org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208) >> > at java.lang.Thread.run(Thread.java:662) >> > >> > Just had to run PE and kill it in the middle. >> > >> > JM >> > >> > >> > >> > 0.96.0RC0 0.94.11 >> > org.apache.hadoop.hbase.PerformanceEvaluation$FilteredScanTest 10.28 >> 10.17 >> > 101.12% org.apache.hadoop.hbase.PerformanceEvaluation$RandomReadTest >> 966.01 >> > 810.58 119.18% >> > org.apache.hadoop.hbase.PerformanceEvaluation$RandomSeekScanTest 98.50 >> > 255.71 38.52% >> org.apache.hadoop.hbase.PerformanceEvaluation$RandomWriteTest >> > 39251.17 25682.11 152.83% >> > org.apache.hadoop.hbase.PerformanceEvaluation$RandomScanWithRange10Test >> > 25844.88 28715.29 90.00% >> > org.apache.hadoop.hbase.PerformanceEvaluation$RandomScanWithRange100Test >> > 20029.48 18022.39 111.14% >> > >> org.apache.hadoop.hbase.PerformanceEvaluation$RandomScanWithRange1000Test >> > 2692.16 2346.85 114.71% >> > org.apache.hadoop.hbase.PerformanceEvaluation$SequentialReadTest 3002.18 >> > 2875.83 104.39% >> > org.apache.hadoop.hbase.PerformanceEvaluation$SequentialWriteTest >> 38995.50 >> > 26693.23 146.09% >> > >> > >> > 2013/9/3 Stack >> > >> >> On Mon, Sep 2, 2013 at 10:51 AM, Jean-Marc Spaggiari < >> >> jean-marc@spaggiari.org> wrote: >> >> >> >> > I have created: >> >> > HBASE-9412 >> >> > HBASE-9413 >> >> > HBASE-9414 >> >> > >> >> > I have not been able yet to reproduce the ZK error. I'm trying. >> >> > >> >> > >> >> Is it when you have a shell connection and then kill it? >> >> >> >> >> >> >> >> > Last, I tried, with no success, to set loglevel to WARN to remove all >> >> > DEBUG and INFO logs. Setting it to WARN remove the DEBUG lines, but I >> >> > keep getting the INFO. Seems that something is setting the log level >> >> > somewhere else, or it's not read. >> >> > >> >> > Here is my log4j.properties file. I removed all the customed log >> level >> >> > to setup WARN for org.apache. And it's still showing INFO... >> >> > >> >> > >> >> >> >> You did it by editing log4j and restarting or in the UI? I think the >> UI >> >> log level setting is broke.... (new issue!) >> >> >> >> Thanks for trying it out JMS, >> >> >> >> So everything is slower in 0.96? >> >> St.Ack >> >> >> >> >> >> >> >> > JM >> >> > >> >> > >> >> > # Define some default values that can be overridden by system >> properties >> >> > hbase.root.logger=WARN,console >> >> > hbase.security.logger=WARN,console >> >> > hbase.log.dir=. >> >> > hbase.log.file=hbase.log >> >> > >> >> > # Define the root logger to the system property "hbase.root.logger". >> >> > log4j.rootLogger=${hbase.root.logger} >> >> > >> >> > # Logging Threshold >> >> > log4j.threshold=ALL >> >> > >> >> > # >> >> > # Daily Rolling File Appender >> >> > # >> >> > log4j.appender.DRFA=org.apache.log4j.DailyRollingFileAppender >> >> > log4j.appender.DRFA.File=${hbase.log.dir}/${hbase.log.file} >> >> > >> >> > # Rollver at midnight >> >> > log4j.appender.DRFA.DatePattern=.yyyy-MM-dd >> >> > >> >> > # 30-day backup >> >> > #log4j.appender.DRFA.MaxBackupIndex=30 >> >> > log4j.appender.DRFA.layout=org.apache.log4j.PatternLayout >> >> > >> >> > # Pattern format: Date LogLevel LoggerName LogMessage >> >> > log4j.appender.DRFA.layout.ConversionPattern=%d{ISO8601} %-5p [%t] >> %c{2}: >> >> > %m%n >> >> > >> >> > # Rolling File Appender properties >> >> > hbase.log.maxfilesize=256MB >> >> > hbase.log.maxbackupindex=20 >> >> > >> >> > # Rolling File Appender >> >> > log4j.appender.RFA=org.apache.log4j.RollingFileAppender >> >> > log4j.appender.RFA.File=${hbase.log.dir}/${hbase.log.file} >> >> > >> >> > log4j.appender.RFA.MaxFileSize=${hbase.log.maxfilesize} >> >> > log4j.appender.RFA.MaxBackupIndex=${hbase.log.maxbackupindex} >> >> > >> >> > log4j.appender.RFA.layout=org.apache.log4j.PatternLayout >> >> > log4j.appender.RFA.layout.ConversionPattern=%d{ISO8601} %-5p [%t] >> %c{2}: >> >> > %m%n >> >> > >> >> > # >> >> > # Security audit appender >> >> > # >> >> > hbase.security.log.file=SecurityAuth.audit >> >> > hbase.security.log.maxfilesize=256MB >> >> > hbase.security.log.maxbackupindex=20 >> >> > log4j.appender.RFAS=org.apache.log4j.RollingFileAppender >> >> > log4j.appender.RFAS.File=${hbase.log.dir}/${hbase.security.log.file} >> >> > log4j.appender.RFAS.MaxFileSize=${hbase.security.log.maxfilesize} >> >> > >> log4j.appender.RFAS.MaxBackupIndex=${hbase.security.log.maxbackupindex} >> >> > log4j.appender.RFAS.layout=org.apache.log4j.PatternLayout >> >> > log4j.appender.RFAS.layout.ConversionPattern=%d{ISO8601} %p %c: %m%n >> >> > log4j.category.SecurityLogger=${hbase.security.logger} >> >> > log4j.additivity.SecurityLogger=false >> >> > >> >> > >> >> >> #log4j.logger.SecurityLogger.org.apache.hadoop.hbase.security.access.AccessController=TRACE >> >> > >> >> > # >> >> > # Null Appender >> >> > # >> >> > log4j.appender.NullAppender=org.apache.log4j.varia.NullAppender >> >> > >> >> > # >> >> > # console >> >> > # Add "console" to rootlogger above if you want to use this >> >> > # >> >> > log4j.appender.console=org.apache.log4j.ConsoleAppender >> >> > log4j.appender.console.target=System.err >> >> > log4j.appender.console.layout=org.apache.log4j.PatternLayout >> >> > log4j.appender.console.layout.ConversionPattern=%d{ISO8601} %-5p [%t] >> >> > %c{2}: %m%n >> >> > >> >> > # Custom Logging levels >> >> > >> >> > log4j.logger.ore.apache=WARN >> >> > >> >> > 2013/9/2 Jean-Marc Spaggiari : >> >> > > Hi St.Ack, >> >> > > >> >> > > I will open the relate JIRAs in few minutes. >> >> > > >> >> > > Regarding performances, RandomSeekScanTest is way slower, and only >> >> > > RandomScanWithRange100Test was faster. Others were similar. For >> >> > > RandomScanWithRange100Test I suspect that I don't have the right >> >> > > number for 0.94 so 0.94.11 tests are running right now on the same >> >> > > server with the same configuration. I will start to have numbers by >> >> > > end of day, else, tomorrow morning, but will most probably take >> about >> >> > > 24h to get all of them. >> >> > > >> >> > > JM >> >> > > >> >> > > 2013/9/2 Stack : >> >> > >> On Mon, Sep 2, 2013 at 9:41 AM, Jean-Marc Spaggiari < >> >> > jean-marc@spaggiari.org >> >> > >>> wrote: >> >> > >> >> >> > >>> Got it. >> >> > >>> >> >> > >>> I can't run the integration tests for now because I'm lacking >> some >> >> > >>> servers :( Need to complete some HBase on RAID tests before I >> can get >> >> > >>> those new servers >> >> > >>> >> >> > >>> First thing is, start-hbase.cmd has the execute flag set. I don't >> >> > >>> think it's required. And it will help with tabulation feature if >> we >> >> > >>> can un-set it. >> >> > >>> >> >> > >>> For 0.96.0RC0 here are my results: >> >> > >>> First, I get 2 .out files. each time I start the server, instead >> of >> >> > >>> usually one... With the same timestamp. >> >> > >>> >> >> > >>> -rw-r--r-- 1 jmspaggiari jmspaggiari 0 Aug 31 15:38 >> >> > >>> hbase-jmspaggiari-master-t430s.out >> >> > >>> -rw-r--r-- 1 jmspaggiari jmspaggiari 0 Aug 31 15:38 >> >> > >>> hbase-jmspaggiari-master-t430s.out.1 >> >> > >>> >> >> > >>> >> >> > >>> In the UI, we say "The .META. table holds references to all User >> >> Table >> >> > >>> regions" but the table name is "hbase:meta" and not ".META." >> >> > >>> >> >> > >>> On the logs, I found this exception that I did not had before: >> >> > >>> 2013-08-31 18:45:05,490 WARN >> >> > >>> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181] >> server.NIOServerCnxn: >> >> > >>> caught end of stream exception >> >> > >>> EndOfStreamException: Unable to read additional data from client >> >> > >>> sessionid 0x140d68bb9d50004, likely client has closed socket >> >> > >>> at >> >> > >>> >> >> org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:220) >> >> > >>> at >> >> > >>> >> >> > >> >> >> org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208) >> >> > >>> at java.lang.Thread.run(Thread.java:662) >> >> > >>> >> >> > >>> >> >> > >>> I ran PE over 48h. I don't have another 0.96 baseline to compare >> >> with, >> >> > >>> so I compared with 0.94. >> >> > >>> org.apache.hadoop.hbase.PerformanceEvaluation$RandomSeekScanTest >> is >> >> > >>> about 3 times slower with 0.96. >> >> > >>> >> >> > >> org.apache.hadoop.hbase.PerformanceEvaluation$RandomScanWithRange100Test >> >> > >>> seems to be 2 times faster. >> >> > >>> >> >> > >>> Writes also are faster but I changed my hard drive since I ran >> 0.94. >> >> > >>> So I'm currently re-running 0.94 and will provide another more >> >> > >>> accurate comparison soon. >> >> > >>> >> >> > >>> Ran LoadTestTool: Failed to write keys: 0 >> >> > >>> >> >> > >>> I have not been able to run IntegrationTestLoadAndVerify nor >> >> > >>> IntegrationTestBigLinkedList because of the lack of servers. >> >> > >>> >> >> > >>> So overall, it seems to be working fine, but I have not been >> able to >> >> > >>> test this release as deeply as I'm usually testing the 0.94 >> releases. >> >> > >>> >> >> > >> >> >> > >> Thank you for taking it for a spin JMS -- you the man. If you >> don't >> >> > open >> >> > >> issues for the teething stuff, I will. >> >> > >> >> >> > >> So 0.96 is slower doing RandomSeekScanTest but faster on your >> other >> >> > tests? >> >> > >> I can take a look too. >> >> > >> >> >> > >> St.Ack >> >> > >> >> >> > > -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You. --20cf3071c9fc513b3d04e584c11a--