hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Elliott Clark <ecl...@apache.org>
Subject Re: [VOTE] The 1st hbase-0.96.0 release candidate is available for download
Date Wed, 04 Sep 2013 03:31:13 GMT
Yeah I got that for a while as well.  Though once online schema change
was disabled I can complete the reduce step with 2.5 G of heap (still
failing :-/).

On Tue, Sep 3, 2013 at 6:30 PM, Devaraj Das <ddas@hortonworks.com> wrote:
> Well from the test code it seems like the problem is due to the fact that
> the reducer got unexpected data and it was trying to construct the log
> message for the user. So the job had already failed in reality.
>
>
> On Tue, Sep 3, 2013 at 6:17 PM, Devaraj Das <ddas@hortonworks.com> wrote:
>
>> Elliott, what are the heap sizes of the M/R tasks in your setup. I was
>> running the job like this (without chaosmonkey to start with):
>>
>> hbase org.apache.hadoop.hbase.test.IntegrationTestBigLinkedList Loop 5 12
>> 2500000 IntegrationTestBigLinkedList 10
>>
>> Even the above test failed with one reduce task failing with OOM, in the
>> verify step. The heap size was set to 3G.
>>
>> 2013-09-04 01:11:56,054 FATAL [main] org.apache.hadoop.mapred.YarnChild: Error running
child : java.lang.OutOfMemoryError: Java heap space
>>       at java.util.Arrays.copyOf(Arrays.java:2882)
>>       at java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:100)
>>       at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:390)
>>       at java.lang.StringBuilder.append(StringBuilder.java:119)
>>       at org.apache.hadoop.hbase.test.IntegrationTestBigLinkedList$Verify$VerifyReducer.reduce(IntegrationTestBigLinkedList.java:576)
>>       at org.apache.hadoop.hbase.test.IntegrationTestBigLinkedList$Verify$VerifyReducer.reduce(IntegrationTestBigLinkedList.java:547)
>>       at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:171)
>>       at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:645)
>>       at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:405)
>>       at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:162)
>>       at java.security.AccessController.doPrivileged(Native Method)
>>       at javax.security.auth.Subject.doAs(Subject.java:396)
>>       at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1477)
>>       at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:157)
>>
>>
>>
>>
>>
>> On Tue, Sep 3, 2013 at 12:56 PM, Elliott Clark <eclark@apache.org> wrote:
>>
>>> Can someone take a look at running Test Big Linked for > 5 iterations
>>> with slowDeterministic chaos monkey on a distributed cluster.  I'm
>>> pretty concerned about HBASE-9338
>>>
>>> On Tue, Sep 3, 2013 at 6:57 AM, Jean-Marc Spaggiari
>>> <jean-marc@spaggiari.org> wrote:
>>> > There was a typo in my log4j.properties :(
>>> >
>>> > So it's working fine.
>>> >
>>> > The only INFO logs I still see are those one:
>>> > 2013-09-03 09:53:07,313 INFO  [M:0;t430s:45176] mortbay.log: Logging to
>>> > org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via
>>> > org.mortbay.log.Slf4jLog
>>> > 2013-09-03 09:53:07,350 INFO  [M:0;t430s:45176] mortbay.log:
>>> jetty-6.1.26
>>> > But there is only very few of them.
>>> >
>>> > Performances wise, here are the numbers (the higher, the better. Rows
>>> per
>>> > seconds, expect for scans where it's rows/min). As you will see, 0.96 is
>>> > slower only for RandomSeekScanTest (way slower) and
>>> RandomScanWithRange10
>>> > but is faster for everything else. I ran the tests with the default
>>> > settings. So I think we should look at RandomSeekScanTest but expect
>>> this
>>> > one, everything else is pretty good.
>>> >
>>> > Also, I have been able to reproduce this exception:
>>> > 2013-09-03 09:55:36,718 WARN  [NIOServerCxn.Factory:
>>> 0.0.0.0/0.0.0.0:2181]
>>> > server.NIOServerCnxn: caught end of stream exception
>>> > EndOfStreamException: Unable to read additional data from client
>>> sessionid
>>> > 0x140e4191edb0009, likely client has closed socket
>>> >     at
>>> > org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:220)
>>> >     at
>>> >
>>> org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208)
>>> >     at java.lang.Thread.run(Thread.java:662)
>>> >
>>> > Just had to run PE and kill it in the middle.
>>> >
>>> > JM
>>> >
>>> >
>>> >
>>> > 0.96.0RC0 0.94.11
>>> >  org.apache.hadoop.hbase.PerformanceEvaluation$FilteredScanTest 10.28
>>> 10.17
>>> > 101.12%  org.apache.hadoop.hbase.PerformanceEvaluation$RandomReadTest
>>> 966.01
>>> > 810.58 119.18%
>>> > org.apache.hadoop.hbase.PerformanceEvaluation$RandomSeekScanTest 98.50
>>> > 255.71 38.52%
>>>  org.apache.hadoop.hbase.PerformanceEvaluation$RandomWriteTest
>>> > 39251.17 25682.11 152.83%
>>> > org.apache.hadoop.hbase.PerformanceEvaluation$RandomScanWithRange10Test
>>> > 25844.88 28715.29 90.00%
>>> > org.apache.hadoop.hbase.PerformanceEvaluation$RandomScanWithRange100Test
>>> > 20029.48 18022.39 111.14%
>>> >
>>> org.apache.hadoop.hbase.PerformanceEvaluation$RandomScanWithRange1000Test
>>> > 2692.16 2346.85 114.71%
>>> > org.apache.hadoop.hbase.PerformanceEvaluation$SequentialReadTest 3002.18
>>> > 2875.83 104.39%
>>> > org.apache.hadoop.hbase.PerformanceEvaluation$SequentialWriteTest
>>> 38995.50
>>> > 26693.23 146.09%
>>> >
>>> >
>>> > 2013/9/3 Stack <stack@duboce.net>
>>> >
>>> >> On Mon, Sep 2, 2013 at 10:51 AM, Jean-Marc Spaggiari <
>>> >> jean-marc@spaggiari.org> wrote:
>>> >>
>>> >> > I have created:
>>> >> >  HBASE-9412
>>> >> >  HBASE-9413
>>> >> >  HBASE-9414
>>> >> >
>>> >> > I have not been able yet to reproduce the ZK error. I'm trying.
>>> >> >
>>> >> >
>>> >> Is it when you have a shell connection and then kill it?
>>> >>
>>> >>
>>> >>
>>> >> > Last, I tried, with no success, to set loglevel to WARN to remove
all
>>> >> > DEBUG and INFO logs. Setting it to WARN remove the DEBUG lines,
but I
>>> >> > keep getting the INFO. Seems that something is setting the log
level
>>> >> > somewhere else, or it's not read.
>>> >> >
>>> >> > Here is my log4j.properties file. I removed all the customed log
>>> level
>>> >> > to setup WARN for org.apache. And it's still showing INFO...
>>> >> >
>>> >> >
>>> >>
>>> >> You did it by editing log4j and restarting or in the UI?  I think the
>>> UI
>>> >> log level setting is broke.... (new issue!)
>>> >>
>>> >> Thanks for trying it out JMS,
>>> >>
>>> >> So everything is slower in 0.96?
>>> >> St.Ack
>>> >>
>>> >>
>>> >>
>>> >> > JM
>>> >> >
>>> >> >
>>> >> > # Define some default values that can be overridden by system
>>> properties
>>> >> > hbase.root.logger=WARN,console
>>> >> > hbase.security.logger=WARN,console
>>> >> > hbase.log.dir=.
>>> >> > hbase.log.file=hbase.log
>>> >> >
>>> >> > # Define the root logger to the system property "hbase.root.logger".
>>> >> > log4j.rootLogger=${hbase.root.logger}
>>> >> >
>>> >> > # Logging Threshold
>>> >> > log4j.threshold=ALL
>>> >> >
>>> >> > #
>>> >> > # Daily Rolling File Appender
>>> >> > #
>>> >> > log4j.appender.DRFA=org.apache.log4j.DailyRollingFileAppender
>>> >> > log4j.appender.DRFA.File=${hbase.log.dir}/${hbase.log.file}
>>> >> >
>>> >> > # Rollver at midnight
>>> >> > log4j.appender.DRFA.DatePattern=.yyyy-MM-dd
>>> >> >
>>> >> > # 30-day backup
>>> >> > #log4j.appender.DRFA.MaxBackupIndex=30
>>> >> > log4j.appender.DRFA.layout=org.apache.log4j.PatternLayout
>>> >> >
>>> >> > # Pattern format: Date LogLevel LoggerName LogMessage
>>> >> > log4j.appender.DRFA.layout.ConversionPattern=%d{ISO8601} %-5p [%t]
>>> %c{2}:
>>> >> > %m%n
>>> >> >
>>> >> > # Rolling File Appender properties
>>> >> > hbase.log.maxfilesize=256MB
>>> >> > hbase.log.maxbackupindex=20
>>> >> >
>>> >> > # Rolling File Appender
>>> >> > log4j.appender.RFA=org.apache.log4j.RollingFileAppender
>>> >> > log4j.appender.RFA.File=${hbase.log.dir}/${hbase.log.file}
>>> >> >
>>> >> > log4j.appender.RFA.MaxFileSize=${hbase.log.maxfilesize}
>>> >> > log4j.appender.RFA.MaxBackupIndex=${hbase.log.maxbackupindex}
>>> >> >
>>> >> > log4j.appender.RFA.layout=org.apache.log4j.PatternLayout
>>> >> > log4j.appender.RFA.layout.ConversionPattern=%d{ISO8601} %-5p [%t]
>>> %c{2}:
>>> >> > %m%n
>>> >> >
>>> >> > #
>>> >> > # Security audit appender
>>> >> > #
>>> >> > hbase.security.log.file=SecurityAuth.audit
>>> >> > hbase.security.log.maxfilesize=256MB
>>> >> > hbase.security.log.maxbackupindex=20
>>> >> > log4j.appender.RFAS=org.apache.log4j.RollingFileAppender
>>> >> > log4j.appender.RFAS.File=${hbase.log.dir}/${hbase.security.log.file}
>>> >> > log4j.appender.RFAS.MaxFileSize=${hbase.security.log.maxfilesize}
>>> >> >
>>> log4j.appender.RFAS.MaxBackupIndex=${hbase.security.log.maxbackupindex}
>>> >> > log4j.appender.RFAS.layout=org.apache.log4j.PatternLayout
>>> >> > log4j.appender.RFAS.layout.ConversionPattern=%d{ISO8601} %p %c:
%m%n
>>> >> > log4j.category.SecurityLogger=${hbase.security.logger}
>>> >> > log4j.additivity.SecurityLogger=false
>>> >> >
>>> >> >
>>> >>
>>> #log4j.logger.SecurityLogger.org.apache.hadoop.hbase.security.access.AccessController=TRACE
>>> >> >
>>> >> > #
>>> >> > # Null Appender
>>> >> > #
>>> >> > log4j.appender.NullAppender=org.apache.log4j.varia.NullAppender
>>> >> >
>>> >> > #
>>> >> > # console
>>> >> > # Add "console" to rootlogger above if you want to use this
>>> >> > #
>>> >> > log4j.appender.console=org.apache.log4j.ConsoleAppender
>>> >> > log4j.appender.console.target=System.err
>>> >> > log4j.appender.console.layout=org.apache.log4j.PatternLayout
>>> >> > log4j.appender.console.layout.ConversionPattern=%d{ISO8601} %-5p
[%t]
>>> >> > %c{2}: %m%n
>>> >> >
>>> >> > # Custom Logging levels
>>> >> >
>>> >> > log4j.logger.ore.apache=WARN
>>> >> >
>>> >> > 2013/9/2 Jean-Marc Spaggiari <jean-marc@spaggiari.org>:
>>> >> > > Hi St.Ack,
>>> >> > >
>>> >> > > I will open the relate JIRAs in few minutes.
>>> >> > >
>>> >> > > Regarding performances, RandomSeekScanTest is way slower,
and only
>>> >> > > RandomScanWithRange100Test was faster. Others were similar.
For
>>> >> > > RandomScanWithRange100Test I suspect that I don't have the
right
>>> >> > > number for 0.94 so 0.94.11 tests are running right now on
the same
>>> >> > > server with the same configuration. I will start to have numbers
by
>>> >> > > end of day, else, tomorrow morning, but will most probably
take
>>> about
>>> >> > > 24h to get all of them.
>>> >> > >
>>> >> > > JM
>>> >> > >
>>> >> > > 2013/9/2 Stack <stack@duboce.net>:
>>> >> > >> On Mon, Sep 2, 2013 at 9:41 AM, Jean-Marc Spaggiari <
>>> >> > jean-marc@spaggiari.org
>>> >> > >>> wrote:
>>> >> > >>
>>> >> > >>> Got it.
>>> >> > >>>
>>> >> > >>> I can't run the integration tests for now because
I'm lacking
>>> some
>>> >> > >>> servers :( Need to complete some HBase on RAID tests
before I
>>> can get
>>> >> > >>> those new servers
>>> >> > >>>
>>> >> > >>> First thing is, start-hbase.cmd has the execute flag
set. I don't
>>> >> > >>> think it's required. And it will help with tabulation
feature if
>>> we
>>> >> > >>> can un-set it.
>>> >> > >>>
>>> >> > >>> For 0.96.0RC0 here are my results:
>>> >> > >>> First, I get 2 .out files. each time I start the server,
instead
>>> of
>>> >> > >>> usually one... With the same timestamp.
>>> >> > >>>
>>> >> > >>> -rw-r--r-- 1 jmspaggiari jmspaggiari     0 Aug 31
15:38
>>> >> > >>> hbase-jmspaggiari-master-t430s.out
>>> >> > >>> -rw-r--r-- 1 jmspaggiari jmspaggiari     0 Aug 31
15:38
>>> >> > >>> hbase-jmspaggiari-master-t430s.out.1
>>> >> > >>>
>>> >> > >>>
>>> >> > >>> In the UI, we say "The .META. table holds references
to all User
>>> >> Table
>>> >> > >>> regions" but the table name is "hbase:meta" and not
".META."
>>> >> > >>>
>>> >> > >>> On the logs, I found this exception that I did not
had before:
>>> >> > >>> 2013-08-31 18:45:05,490 WARN
>>> >> > >>> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181]
>>> server.NIOServerCnxn:
>>> >> > >>> caught end of stream exception
>>> >> > >>> EndOfStreamException: Unable to read additional data
from client
>>> >> > >>> sessionid 0x140d68bb9d50004, likely client has closed
socket
>>> >> > >>>     at
>>> >> > >>>
>>> >> org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:220)
>>> >> > >>>     at
>>> >> > >>>
>>> >> >
>>> >>
>>> org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208)
>>> >> > >>>     at java.lang.Thread.run(Thread.java:662)
>>> >> > >>>
>>> >> > >>>
>>> >> > >>> I ran PE over 48h. I don't have another 0.96 baseline
to compare
>>> >> with,
>>> >> > >>> so I compared with 0.94.
>>> >> > >>> org.apache.hadoop.hbase.PerformanceEvaluation$RandomSeekScanTest
>>> is
>>> >> > >>> about 3 times slower with 0.96.
>>> >> > >>>
>>> >> >
>>> org.apache.hadoop.hbase.PerformanceEvaluation$RandomScanWithRange100Test
>>> >> > >>> seems to be 2 times faster.
>>> >> > >>>
>>> >> > >>> Writes also are faster but I changed my hard drive
since I ran
>>> 0.94.
>>> >> > >>> So I'm currently re-running 0.94 and will provide
another more
>>> >> > >>> accurate comparison soon.
>>> >> > >>>
>>> >> > >>> Ran LoadTestTool: Failed to write keys: 0
>>> >> > >>>
>>> >> > >>> I have not been able to run IntegrationTestLoadAndVerify
nor
>>> >> > >>> IntegrationTestBigLinkedList because of the lack of
servers.
>>> >> > >>>
>>> >> > >>> So overall, it seems to be working fine, but I have
not been
>>> able to
>>> >> > >>> test this release as deeply as I'm usually testing
the 0.94
>>> releases.
>>> >> > >>>
>>> >> > >>
>>> >> > >> Thank you for taking it for a spin JMS -- you the man.
 If you
>>> don't
>>> >> > open
>>> >> > >> issues for the teething stuff, I will.
>>> >> > >>
>>> >> > >> So 0.96 is slower doing RandomSeekScanTest but faster
on your
>>> other
>>> >> > tests?
>>> >> > >>  I can take a look too.
>>> >> > >>
>>> >> > >> St.Ack
>>> >> >
>>> >>
>>>
>>
>>
>
> --
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to
> which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.

Mime
View raw message