hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "stack (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HBASE-867) If millions of columns in a column family, hbase scanner won't come up
Date Thu, 04 Sep 2008 16:29:45 GMT

    [ https://issues.apache.org/jira/browse/HBASE-867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12628374#action_12628374

stack commented on HBASE-867:

To be clear, if thousands of columns plus -- i.e. a canonical usage -- hbase does not work.
 Here is some of the problem code form StoreFileScanner#next:

          while ((keys[i] != null)
              && (Bytes.compareTo(keys[i].getRow(), viableRow.getRow()) == 0)) {

            // If we are doing a wild card match or there are multiple matchers
            // per column, we need to scan all the older versions of this row
            // to pick up the rest of the family members
                && !isMultipleMatchScanner()
                && (keys[i].getTimestamp() != viableRow.getTimestamp())) {

            if (columnMatch(i)) {              
              // We only want the first result for any specific family member
              if(!results.containsKey(keys[i].getColumn())) {
                    new Cell(vals[i], keys[i].getTimestamp()));
                insertedItem = true;
            } else {
              // Content is sorted.  If column no longer matches, break.

            if (!getNext(i)) {

          // Advance the current scanner beyond the chosen row, to
          // a valid timestamp, so we're ready next time.
          while ((keys[i] != null) &&
              ((Bytes.compareTo(keys[i].getRow(), viableRow.getRow()) <= 0)
                  || (keys[i].getTimestamp() > this.timestamp)
                  || (! columnMatch(i)))) {

The whiles find next row by getting cells until the row does not match.  If many columns per
row, then that can take for ever (as its doing in Daniel's case).  Need to have a file format
that has an index that says where next row is.  An option would say whether to get to next
row by nexting or instead asking index.

> If millions of columns in a column family, hbase scanner won't come up
> ----------------------------------------------------------------------
>                 Key: HBASE-867
>                 URL: https://issues.apache.org/jira/browse/HBASE-867
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: stack
> Our Daniel has uploaded a table that has a column family with millions of columns in
it.  He can get items from the table promptly specifying row and column.  Scanning is another
matter.  Thread dumping I see we're stuck in the scanner constructor nexting through cells.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message