hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Devaraj Das <d...@hortonworks.com>
Subject Re: [VOTE] The 1st hbase-0.96.0 release candidate is available for download
Date Wed, 04 Sep 2013 01:30:48 GMT
Well from the test code it seems like the problem is due to the fact that
the reducer got unexpected data and it was trying to construct the log
message for the user. So the job had already failed in reality.


On Tue, Sep 3, 2013 at 6:17 PM, Devaraj Das <ddas@hortonworks.com> wrote:

> Elliott, what are the heap sizes of the M/R tasks in your setup. I was
> running the job like this (without chaosmonkey to start with):
>
> hbase org.apache.hadoop.hbase.test.IntegrationTestBigLinkedList Loop 5 12
> 2500000 IntegrationTestBigLinkedList 10
>
> Even the above test failed with one reduce task failing with OOM, in the
> verify step. The heap size was set to 3G.
>
> 2013-09-04 01:11:56,054 FATAL [main] org.apache.hadoop.mapred.YarnChild: Error running
child : java.lang.OutOfMemoryError: Java heap space
> 	at java.util.Arrays.copyOf(Arrays.java:2882)
> 	at java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:100)
> 	at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:390)
> 	at java.lang.StringBuilder.append(StringBuilder.java:119)
> 	at org.apache.hadoop.hbase.test.IntegrationTestBigLinkedList$Verify$VerifyReducer.reduce(IntegrationTestBigLinkedList.java:576)
> 	at org.apache.hadoop.hbase.test.IntegrationTestBigLinkedList$Verify$VerifyReducer.reduce(IntegrationTestBigLinkedList.java:547)
> 	at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:171)
> 	at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:645)
> 	at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:405)
> 	at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:162)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at javax.security.auth.Subject.doAs(Subject.java:396)
> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1477)
> 	at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:157)
>
>
>
>
>
> On Tue, Sep 3, 2013 at 12:56 PM, Elliott Clark <eclark@apache.org> wrote:
>
>> Can someone take a look at running Test Big Linked for > 5 iterations
>> with slowDeterministic chaos monkey on a distributed cluster.  I'm
>> pretty concerned about HBASE-9338
>>
>> On Tue, Sep 3, 2013 at 6:57 AM, Jean-Marc Spaggiari
>> <jean-marc@spaggiari.org> wrote:
>> > There was a typo in my log4j.properties :(
>> >
>> > So it's working fine.
>> >
>> > The only INFO logs I still see are those one:
>> > 2013-09-03 09:53:07,313 INFO  [M:0;t430s:45176] mortbay.log: Logging to
>> > org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via
>> > org.mortbay.log.Slf4jLog
>> > 2013-09-03 09:53:07,350 INFO  [M:0;t430s:45176] mortbay.log:
>> jetty-6.1.26
>> > But there is only very few of them.
>> >
>> > Performances wise, here are the numbers (the higher, the better. Rows
>> per
>> > seconds, expect for scans where it's rows/min). As you will see, 0.96 is
>> > slower only for RandomSeekScanTest (way slower) and
>> RandomScanWithRange10
>> > but is faster for everything else. I ran the tests with the default
>> > settings. So I think we should look at RandomSeekScanTest but expect
>> this
>> > one, everything else is pretty good.
>> >
>> > Also, I have been able to reproduce this exception:
>> > 2013-09-03 09:55:36,718 WARN  [NIOServerCxn.Factory:
>> 0.0.0.0/0.0.0.0:2181]
>> > server.NIOServerCnxn: caught end of stream exception
>> > EndOfStreamException: Unable to read additional data from client
>> sessionid
>> > 0x140e4191edb0009, likely client has closed socket
>> >     at
>> > org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:220)
>> >     at
>> >
>> org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208)
>> >     at java.lang.Thread.run(Thread.java:662)
>> >
>> > Just had to run PE and kill it in the middle.
>> >
>> > JM
>> >
>> >
>> >
>> > 0.96.0RC0 0.94.11
>> >  org.apache.hadoop.hbase.PerformanceEvaluation$FilteredScanTest 10.28
>> 10.17
>> > 101.12%  org.apache.hadoop.hbase.PerformanceEvaluation$RandomReadTest
>> 966.01
>> > 810.58 119.18%
>> > org.apache.hadoop.hbase.PerformanceEvaluation$RandomSeekScanTest 98.50
>> > 255.71 38.52%
>>  org.apache.hadoop.hbase.PerformanceEvaluation$RandomWriteTest
>> > 39251.17 25682.11 152.83%
>> > org.apache.hadoop.hbase.PerformanceEvaluation$RandomScanWithRange10Test
>> > 25844.88 28715.29 90.00%
>> > org.apache.hadoop.hbase.PerformanceEvaluation$RandomScanWithRange100Test
>> > 20029.48 18022.39 111.14%
>> >
>> org.apache.hadoop.hbase.PerformanceEvaluation$RandomScanWithRange1000Test
>> > 2692.16 2346.85 114.71%
>> > org.apache.hadoop.hbase.PerformanceEvaluation$SequentialReadTest 3002.18
>> > 2875.83 104.39%
>> > org.apache.hadoop.hbase.PerformanceEvaluation$SequentialWriteTest
>> 38995.50
>> > 26693.23 146.09%
>> >
>> >
>> > 2013/9/3 Stack <stack@duboce.net>
>> >
>> >> On Mon, Sep 2, 2013 at 10:51 AM, Jean-Marc Spaggiari <
>> >> jean-marc@spaggiari.org> wrote:
>> >>
>> >> > I have created:
>> >> >  HBASE-9412
>> >> >  HBASE-9413
>> >> >  HBASE-9414
>> >> >
>> >> > I have not been able yet to reproduce the ZK error. I'm trying.
>> >> >
>> >> >
>> >> Is it when you have a shell connection and then kill it?
>> >>
>> >>
>> >>
>> >> > Last, I tried, with no success, to set loglevel to WARN to remove all
>> >> > DEBUG and INFO logs. Setting it to WARN remove the DEBUG lines, but
I
>> >> > keep getting the INFO. Seems that something is setting the log level
>> >> > somewhere else, or it's not read.
>> >> >
>> >> > Here is my log4j.properties file. I removed all the customed log
>> level
>> >> > to setup WARN for org.apache. And it's still showing INFO...
>> >> >
>> >> >
>> >>
>> >> You did it by editing log4j and restarting or in the UI?  I think the
>> UI
>> >> log level setting is broke.... (new issue!)
>> >>
>> >> Thanks for trying it out JMS,
>> >>
>> >> So everything is slower in 0.96?
>> >> St.Ack
>> >>
>> >>
>> >>
>> >> > JM
>> >> >
>> >> >
>> >> > # Define some default values that can be overridden by system
>> properties
>> >> > hbase.root.logger=WARN,console
>> >> > hbase.security.logger=WARN,console
>> >> > hbase.log.dir=.
>> >> > hbase.log.file=hbase.log
>> >> >
>> >> > # Define the root logger to the system property "hbase.root.logger".
>> >> > log4j.rootLogger=${hbase.root.logger}
>> >> >
>> >> > # Logging Threshold
>> >> > log4j.threshold=ALL
>> >> >
>> >> > #
>> >> > # Daily Rolling File Appender
>> >> > #
>> >> > log4j.appender.DRFA=org.apache.log4j.DailyRollingFileAppender
>> >> > log4j.appender.DRFA.File=${hbase.log.dir}/${hbase.log.file}
>> >> >
>> >> > # Rollver at midnight
>> >> > log4j.appender.DRFA.DatePattern=.yyyy-MM-dd
>> >> >
>> >> > # 30-day backup
>> >> > #log4j.appender.DRFA.MaxBackupIndex=30
>> >> > log4j.appender.DRFA.layout=org.apache.log4j.PatternLayout
>> >> >
>> >> > # Pattern format: Date LogLevel LoggerName LogMessage
>> >> > log4j.appender.DRFA.layout.ConversionPattern=%d{ISO8601} %-5p [%t]
>> %c{2}:
>> >> > %m%n
>> >> >
>> >> > # Rolling File Appender properties
>> >> > hbase.log.maxfilesize=256MB
>> >> > hbase.log.maxbackupindex=20
>> >> >
>> >> > # Rolling File Appender
>> >> > log4j.appender.RFA=org.apache.log4j.RollingFileAppender
>> >> > log4j.appender.RFA.File=${hbase.log.dir}/${hbase.log.file}
>> >> >
>> >> > log4j.appender.RFA.MaxFileSize=${hbase.log.maxfilesize}
>> >> > log4j.appender.RFA.MaxBackupIndex=${hbase.log.maxbackupindex}
>> >> >
>> >> > log4j.appender.RFA.layout=org.apache.log4j.PatternLayout
>> >> > log4j.appender.RFA.layout.ConversionPattern=%d{ISO8601} %-5p [%t]
>> %c{2}:
>> >> > %m%n
>> >> >
>> >> > #
>> >> > # Security audit appender
>> >> > #
>> >> > hbase.security.log.file=SecurityAuth.audit
>> >> > hbase.security.log.maxfilesize=256MB
>> >> > hbase.security.log.maxbackupindex=20
>> >> > log4j.appender.RFAS=org.apache.log4j.RollingFileAppender
>> >> > log4j.appender.RFAS.File=${hbase.log.dir}/${hbase.security.log.file}
>> >> > log4j.appender.RFAS.MaxFileSize=${hbase.security.log.maxfilesize}
>> >> >
>> log4j.appender.RFAS.MaxBackupIndex=${hbase.security.log.maxbackupindex}
>> >> > log4j.appender.RFAS.layout=org.apache.log4j.PatternLayout
>> >> > log4j.appender.RFAS.layout.ConversionPattern=%d{ISO8601} %p %c: %m%n
>> >> > log4j.category.SecurityLogger=${hbase.security.logger}
>> >> > log4j.additivity.SecurityLogger=false
>> >> >
>> >> >
>> >>
>> #log4j.logger.SecurityLogger.org.apache.hadoop.hbase.security.access.AccessController=TRACE
>> >> >
>> >> > #
>> >> > # Null Appender
>> >> > #
>> >> > log4j.appender.NullAppender=org.apache.log4j.varia.NullAppender
>> >> >
>> >> > #
>> >> > # console
>> >> > # Add "console" to rootlogger above if you want to use this
>> >> > #
>> >> > log4j.appender.console=org.apache.log4j.ConsoleAppender
>> >> > log4j.appender.console.target=System.err
>> >> > log4j.appender.console.layout=org.apache.log4j.PatternLayout
>> >> > log4j.appender.console.layout.ConversionPattern=%d{ISO8601} %-5p [%t]
>> >> > %c{2}: %m%n
>> >> >
>> >> > # Custom Logging levels
>> >> >
>> >> > log4j.logger.ore.apache=WARN
>> >> >
>> >> > 2013/9/2 Jean-Marc Spaggiari <jean-marc@spaggiari.org>:
>> >> > > Hi St.Ack,
>> >> > >
>> >> > > I will open the relate JIRAs in few minutes.
>> >> > >
>> >> > > Regarding performances, RandomSeekScanTest is way slower, and
only
>> >> > > RandomScanWithRange100Test was faster. Others were similar. For
>> >> > > RandomScanWithRange100Test I suspect that I don't have the right
>> >> > > number for 0.94 so 0.94.11 tests are running right now on the
same
>> >> > > server with the same configuration. I will start to have numbers
by
>> >> > > end of day, else, tomorrow morning, but will most probably take
>> about
>> >> > > 24h to get all of them.
>> >> > >
>> >> > > JM
>> >> > >
>> >> > > 2013/9/2 Stack <stack@duboce.net>:
>> >> > >> On Mon, Sep 2, 2013 at 9:41 AM, Jean-Marc Spaggiari <
>> >> > jean-marc@spaggiari.org
>> >> > >>> wrote:
>> >> > >>
>> >> > >>> Got it.
>> >> > >>>
>> >> > >>> I can't run the integration tests for now because I'm
lacking
>> some
>> >> > >>> servers :( Need to complete some HBase on RAID tests before
I
>> can get
>> >> > >>> those new servers
>> >> > >>>
>> >> > >>> First thing is, start-hbase.cmd has the execute flag set.
I don't
>> >> > >>> think it's required. And it will help with tabulation
feature if
>> we
>> >> > >>> can un-set it.
>> >> > >>>
>> >> > >>> For 0.96.0RC0 here are my results:
>> >> > >>> First, I get 2 .out files. each time I start the server,
instead
>> of
>> >> > >>> usually one... With the same timestamp.
>> >> > >>>
>> >> > >>> -rw-r--r-- 1 jmspaggiari jmspaggiari     0 Aug 31 15:38
>> >> > >>> hbase-jmspaggiari-master-t430s.out
>> >> > >>> -rw-r--r-- 1 jmspaggiari jmspaggiari     0 Aug 31 15:38
>> >> > >>> hbase-jmspaggiari-master-t430s.out.1
>> >> > >>>
>> >> > >>>
>> >> > >>> In the UI, we say "The .META. table holds references to
all User
>> >> Table
>> >> > >>> regions" but the table name is "hbase:meta" and not ".META."
>> >> > >>>
>> >> > >>> On the logs, I found this exception that I did not had
before:
>> >> > >>> 2013-08-31 18:45:05,490 WARN
>> >> > >>> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181]
>> server.NIOServerCnxn:
>> >> > >>> caught end of stream exception
>> >> > >>> EndOfStreamException: Unable to read additional data from
client
>> >> > >>> sessionid 0x140d68bb9d50004, likely client has closed
socket
>> >> > >>>     at
>> >> > >>>
>> >> org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:220)
>> >> > >>>     at
>> >> > >>>
>> >> >
>> >>
>> org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208)
>> >> > >>>     at java.lang.Thread.run(Thread.java:662)
>> >> > >>>
>> >> > >>>
>> >> > >>> I ran PE over 48h. I don't have another 0.96 baseline
to compare
>> >> with,
>> >> > >>> so I compared with 0.94.
>> >> > >>> org.apache.hadoop.hbase.PerformanceEvaluation$RandomSeekScanTest
>> is
>> >> > >>> about 3 times slower with 0.96.
>> >> > >>>
>> >> >
>> org.apache.hadoop.hbase.PerformanceEvaluation$RandomScanWithRange100Test
>> >> > >>> seems to be 2 times faster.
>> >> > >>>
>> >> > >>> Writes also are faster but I changed my hard drive since
I ran
>> 0.94.
>> >> > >>> So I'm currently re-running 0.94 and will provide another
more
>> >> > >>> accurate comparison soon.
>> >> > >>>
>> >> > >>> Ran LoadTestTool: Failed to write keys: 0
>> >> > >>>
>> >> > >>> I have not been able to run IntegrationTestLoadAndVerify
nor
>> >> > >>> IntegrationTestBigLinkedList because of the lack of servers.
>> >> > >>>
>> >> > >>> So overall, it seems to be working fine, but I have not
been
>> able to
>> >> > >>> test this release as deeply as I'm usually testing the
0.94
>> releases.
>> >> > >>>
>> >> > >>
>> >> > >> Thank you for taking it for a spin JMS -- you the man.  If
you
>> don't
>> >> > open
>> >> > >> issues for the teething stuff, I will.
>> >> > >>
>> >> > >> So 0.96 is slower doing RandomSeekScanTest but faster on your
>> other
>> >> > tests?
>> >> > >>  I can take a look too.
>> >> > >>
>> >> > >> St.Ack
>> >> >
>> >>
>>
>
>

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message