hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From lars hofhansl <la...@apache.org>
Subject Re: [VOTE] The 1st hbase 0.94.15 release candidate is available for download
Date Thu, 19 Dec 2013 20:39:52 GMT
Thanks again, JM.


Yep, both IntegrationTestLoadAndVerify and IntegrationTestBigLinkedList pass for me in local
install every time I run it (many times by now). JDK 1.6.0_34-b04.

One thing I found is that they do not clean up their data and fill up the disk, once the disk
is full the tests simply time out for me, but they could fail in more "interesting ways" too
when that happens... Maybe that's what you see?


In any case nothing new with this release, right? Need to double-check the tests.

-- Lars



----- Original Message -----
From: Jean-Marc Spaggiari <jean-marc@spaggiari.org>
To: lars hofhansl <larsh@apache.org>; dev <dev@hbase.apache.org>
Cc: 
Sent: Thursday, December 19, 2013 10:45 AM
Subject: Re: [VOTE] The 1st hbase 0.94.15 release candidate is available for download

For the version, issue is the alter command I used. Sorry about that.
Forget it.

For IntegrationTestLoadAndVerify I have already reported the issue with
0.94.10 on July 23rd.

Just retried with 0.94.14 and 0.94.13 and failed on both too. By failed I
mean they give me  REFERENCES_CHECKED=9855773 instead of a 100000000. Are
you getting 100000000?

Single node it a local install on my laptop. No other HBase instances
configured, using local file system. For the 7 node cluster it's using
Hadoop 1.0.4

In local mode I'm running with jdk 1.6.0_45. On the 7 nodes I'm running
1.7.0_5

What's strange with the abstract issue is that IntegrationTestsDriver is
not the only one using ToolRunner, but is the only one to fail. Strange.

JM



2013/12/19 lars hofhansl <larsh@apache.org>

> The single node cluster was just a local install, right? I.e. using the
> local file system, rather than HDFS...?
> On the 7 node cluster, which version of HDFS did you use? If not 1.0.4 I
> assume you recompiled HBase :)
>
> I definitely do not see the AbstractMethodError issue. That very looks
> like a classpath setup issue.
>
> Ran IntegrationTestLoadAndVerify and IntegrationTestBigLinkedList in a
> loop in local mode. Didn't fail once.
>
> Let's chat offline and figure out if/where your setup is different from
> mine.
>
> -- Lars
>
> ________________________________
> From: lars hofhansl <larsh@apache.org>
> To: Jean-Marc Spaggiari <jean-marc@spaggiari.org>; dev <
> dev@hbase.apache.org>
> Sent: Thursday, December 19, 2013 8:53 AM
> Subject: Re: [VOTE] The 1st hbase 0.94.15 release candidate is available
> for download
>
>
> Thanks JM.
>
>
> You did a "raw" scan below. It'll return to you exactly what is there, so
> you'll see the 3 versions before you compact, that is by design.
> java.lang.AbstractMethodError looks like an issue local to your install.
> I'll check.
>
>
> IntegrationTestLoadAndVerify is interesting. Did that pass reliably in
> older releases of 0.94 (0.94.14 or 0.94.13)?
>
> -- Lars
>
>
> ________________________________
>
> From: Jean-Marc Spaggiari <jean-marc@spaggiari.org>
> To: dev <dev@hbase.apache.org>; lars hofhansl <larsh@apache.org>
> Sent: Thursday, December 19, 2013 7:01 AM
> Subject: Re: [VOTE] The 1st hbase 0.94.15 release candidate is available
> for download
>
>
>
> tl;tr see arrow below.
>
>
>
> Downloaded and checked signature for bother vanilla and secured. Passed.
> Random checked documentation and CHANGES.txt. Passed
>
>
> On a single node cluster:
> Ran the tests. All passed.
> Ran IntegrationTestLoadAndVerify. Got  REFERENCES_CHECKED=9855424,
> expected 10000000? Failed?
> Ran IntegrationTestBigLinkedList. Passed.
> Ran HBCK after those tests and got many errors about _original-evil-name
> and clone tables.
> Cleared everything, restarted HBase. Re-ran IntegrationTestBigLinkedList,
> HBCK ok. Re-ran IntegrationTestLoadAndVerify, failed again:
> 13/12/18 21:24:24 ERROR test.IntegrationTestBigLinkedList$Verify: Expected
> referenced count does not match with actual referenced count. expected
> referenced=3000000 ,actual=9000000
> Exception in thread "main" java.lang.RuntimeException: Verify.verify failed
>     at
> org.apache.hadoop.hbase.test.IntegrationTestBigLinkedList$Loop.runVerify(IntegrationTestBigLinkedList.java:724)
>     at
> org.apache.hadoop.hbase.test.IntegrationTestBigLinkedList$Loop.run(IntegrationTestBigLinkedList.java:757)
>     at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>     at
> org.apache.hadoop.hbase.test.IntegrationTestBigLinkedList.run(IntegrationTestBigLinkedList.java:1069)
>     at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>     at
> org.apache.hadoop.hbase.test.IntegrationTestBigLinkedList.main(IntegrationTestBigLinkedList.java:1073)
>
> But now HBCK is clean. Figured that HBCK issue is because of some leftover
> from org.apache.hadoop.hbase.regionserver.TestStoreFile who is writting in
> the same directory as the default standalone HBase.
>
> From the shell, create a table 15 regions, put, compact, scan, etc. Table
> definition is VERSIONS => 2. However, scan 't1', {RAW => true, VERSIONS =>
> 10} still return 3 versions even after flush/compact/major_compact:
> hbase(main):034:0> scan 't1', {RAW => true, VERSIONS => 10}
> ROW
> COLUMN+CELL
>  rowkey                                                   column=f1:c1,
> timestamp=1387421969489,
> value=value
>  rowkey                                                   column=f1:c1,
> timestamp=1387421969337,
> value=value
>  rowkey                                                   column=f1:c1,
> timestamp=1387421969162,
> value=value
> 1 row(s) in 0.0570 seconds
>
> Will have expected only 2 to be return.
>
>
>
> Stopped HBase, checked the log, everything is fine.
>
>
> Now on a 7 nodes cluster:
>
> Deployed jars and did rolling restart on a 0.94.14 cluster. Passed.
>
> Configured default balancer, merged a 60 region table to a single region,
> restarted the cluster, all fine.
>
> major_compact the table to get it split into 60 regions, balancer, all
> fine except that balancer need to be run twice to get correct balancing.
>
> Some "No serialized HRegionInfo in keyvalues" in the logs not related to
> the tables I'm "playing" with.
>
> Restored customized balancer, restarted, rebalanced, all fine.
> Ran IntegrationTestLoadAndVerify. Got  REFERENCES_CHECKED=9855645,
> expected 10000000? Failed?
>
> Ran IntegrationTestBigLinkedList. Passed.
>
>
> Last, I tried to run IntegrationTestsDriver but it failed. I need to look
> at that.
>
> hbase@node3:~/hbase-0.94.3$ bin/hbase
> org.apache.hadoop.hbase.IntegrationTestsDriver
> Exception in thread "main" java.lang.AbstractMethodError:
> org.apache.hadoop.hbase.util.AbstractHBaseTool.doWork()V
>     at
> org.apache.hadoop.hbase.util.AbstractHBaseTool.run(AbstractHBaseTool.java:103)
>     at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>     at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
>     at
> org.apache.hadoop.hbase.IntegrationTestsDriver.main(IntegrationTestsDriver.java:47)
>
>
>
>
> =====> tl;tr:
>
> - Small issue with balancer when 60 regions assigned to a single server.
> Need to run twice to get that correctly balanced;
>
> - Leftover in the wrong place from
> org.apache.hadoop.hbase.regionserver.TestStoreFile;
> - Table with VERSIONS => 2 returns 3 versions instead of 2;
> - IntegrationTestsDriver not running.
>
>
> I don't think there is anything here to stop the release but there is
> still few things that need to be looked at.
>
>
> JM
>
>
>
>
> 2013/12/18 lars hofhansl <larsh@apache.org>
>
> The 1st 0.94.15 RC is available for download at
> http://people.apache.org/~larsh/hbase-0.94.15-rc0/
> >Signed with my code signing key: C7CFE328
> >
> >HBase 0.94.15 is a bug fix release along with some performance
> improvements:
> >    [HBASE-7886] - [replication] hlog zk node will not be deleted if
> client roll hlog
> >    [HBASE-9485] - TableOutputCommitter should implement recovery if we
> don't want jobs to start from 0 on RM restart
> >    [HBASE-9995] - Not stopping ReplicationSink when using custom
> implementation for the ReplicationSink
> >    [HBASE-10014] - HRegion#doMiniBatchMutation rollbacks the memstore
> even if there is nothing to rollback.
> >    [HBASE-10015] - Replace intrinsic locking with explicit locks in
> StoreScanner
> >    [HBASE-10026] - HBaseAdmin#createTable could fail if region splits
> too fast
> >    [HBASE-10046] - Unmonitored HBase service could accumulate Status
> objects and OOM
> >    [HBASE-10057] - TestRestoreFlushSnapshotFromClient and
> TestRestoreSnapshotFromClient fail to finish occasionally
> >    [HBASE-10061] - TableMapReduceUtil.findOrCreateJar calls
> updateMap(null, ) resulting in thrown NPE
> >    [HBASE-10064] - AggregateClient.validateParameters can throw NPE
> >    [HBASE-10089] - Metrics intern table names cause eventual permgen OOM
> in 0.94
> >    [HBASE-10111] - Verify that a snapshot is not corrupted before
> restoring it
> >    [HBASE-10112] - Hbase rest query params for maxVersions and maxValues
> are not parsed
> >    [HBASE-10117] - Avoid synchronization in
> HRegionScannerImpl.isFilterDone
> >    [HBASE-10120] - start-hbase.sh doesn't respect --config in
> non-distributed mode
> >    [HBASE-10179] - HRegionServer underreports readRequestCounts by 1
> under certain conditions
> >    [HBASE-10181] - HBaseObjectWritable.readObject catches
> DoNotRetryIOException and wraps it back in a regular IOException
> >    [HBASE-9931] - Optional setBatch for CopyTable to copy large rows in
> batches
> >    [HBASE-10001] - Add a coprocessor to help testing the performances
> without taking into account the i/o
> >    [HBASE-10007] - PerformanceEvaluation: Add sampling and latency
> collection to randomRead test
> >    [HBASE-10010] - eliminate the put latency spike on the new log file
> beginning
> >    [HBASE-10048] - Add hlog number metric in regionserver
> >    [HBASE-10049] - Small improvments in region_mover.rb
> >    [HBASE-10093] - Unregister ReplicationSource metric bean when the
> replication source thread is terminated
> >    [HBASE-9047] - Tool to handle finishing replication when the cluster
> is offline
> >    [HBASE-10119] - Allow HBase coprocessors to clean up when they fail
> >    [HBASE-9927] - ReplicationLogCleaner#stop() calls
> HConnectionManager#deleteConnection() unnecessarily
> >    [HBASE-9986] - Incorporate HTTPS support for HBase (0.94 port)
> >    [HBASE-10058] - Test for HBASE-9915 (avoid reading index blocks)
> >    [HBASE-10189] - Intermittent TestReplicationSyncUpTool failure
> >
> >The list of changes is also available here:
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12310753&version=12325559
> >
> >Here're the jenkins runs for this RC:
> https://builds.apache.org/job/HBase-0.94.15/2/ and
> https://builds.apache.org/job/HBase-0.94.15-security/1/
> >
> >Please try out the RC, check out the doc, take it for a spin, etc, and
> vote +1/-1 by EOD December 27th on whether we should release this as
> 0.94.15. (9 days because of the holidays)
> >
> >Thanks.
> >
> >-- Lars
> >
>


Mime
View raw message