hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jonathan Gray (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HBASE-867) If millions of columns in a column family, hbase scanner won't come up
Date Wed, 17 Jun 2009 20:00:07 GMT

    [ https://issues.apache.org/jira/browse/HBASE-867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12720856#action_12720856
] 

Jonathan Gray commented on HBASE-867:
-------------------------------------

I am doing tests for this issue on a 5+1 node cluster, each node is 2core/2gb and hosting
two HDFS and two HBase instances (0.19 cluster still up but it's idle).

Using a newer version of the HBench tool I posted in HBASE-1501, I'm able to run a number
of different tests with high numbers of columns.

My test is inserting 10 rows, each with 2M columns.  I do it in 200 rounds, each round I insert
10k columns in each of the 10 rows.

Qualifiers are incremented binary longs (1 -> 2M), so 8 bytes.  Values are randomized binary
data of fixed length.  By varying the size of the value (have tried between 8 and 32 bytes
per value), I can get different behavior.  

With not much memory to give the RS, I run into OOME problems when serializing the Result.
 I'm going to rerun tests at higher value sizes and get some clean logs to look at, making
sure I have block caching disabled so it doesn't hog heap.

However, with 8 byte values I'm able to import without a problem (causes several splits, in
the end we have 5 regions for the 10 rows).  In addition to the import test, I'm also scanning
these 10 rows in two ways.  A full scan (all in family) as well as a skip scan (i'm asking
for two specific columns, qualifier=1 and qualifier=1888888, so beginning and end of each
row).

{noformat}
Inserted 10 rows each with 2000000 total columns in 344566ms (34456.6ms/row)

Skip Scanner open
Row [row0] Scanned, Contains 2 Columns (10155 ms)
Row [row1] Scanned, Contains 2 Columns (9978 ms)
Row [row2] Scanned, Contains 2 Columns (10675 ms)
Row [row3] Scanned, Contains 2 Columns (9608 ms)
Row [row4] Scanned, Contains 2 Columns (11703 ms)
Row [row5] Scanned, Contains 2 Columns (12103 ms)
Row [row6] Scanned, Contains 2 Columns (6828 ms)
Row [row7] Scanned, Contains 2 Columns (6603 ms)
Row [row8] Scanned, Contains 2 Columns (6331 ms)
Row [row9] Scanned, Contains 2 Columns (6553 ms)
Scanned 10 rows in 90551ms (9055.1ms/row)

Full Scanner open
Row [row0] Scanned, Contains 2000000 Columns (14374 ms)
Row [row1] Scanned, Contains 2000000 Columns (14879 ms)
Row [row2] Scanned, Contains 2000000 Columns (14053 ms)
Row [row3] Scanned, Contains 2000000 Columns (14263 ms)
Row [row4] Scanned, Contains 2000000 Columns (8811 ms)
Row [row5] Scanned, Contains 2000000 Columns (10327 ms)
Row [row6] Scanned, Contains 2000000 Columns (9757 ms)
Row [row7] Scanned, Contains 2000000 Columns (9343 ms)
Row [row8] Scanned, Contains 2000000 Columns (9526 ms)
Row [row9] Scanned, Contains 2000000 Columns (10004 ms)
Scanned 10 rows in 115342ms (11534.2ms/row)
{noformat}

Repeated runs improve performance, and ordering of the two types of scans makes a difference.
 Block cache is off so we're seeing the effect of the linux file cache.

> If millions of columns in a column family, hbase scanner won't come up
> ----------------------------------------------------------------------
>
>                 Key: HBASE-867
>                 URL: https://issues.apache.org/jira/browse/HBASE-867
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: stack
>            Assignee: Jonathan Gray
>            Priority: Critical
>             Fix For: 0.20.0
>
>
> Our Daniel has uploaded a table that has a column family with millions of columns in
it.  He can get items from the table promptly specifying row and column.  Scanning is another
matter.  Thread dumping I see we're stuck in the scanner constructor nexting through cells.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message