Return-Path: Delivered-To: apmail-hadoop-hbase-dev-archive@locus.apache.org Received: (qmail 64114 invoked from network); 4 Sep 2008 16:30:36 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 4 Sep 2008 16:30:36 -0000 Received: (qmail 18274 invoked by uid 500); 4 Sep 2008 16:30:34 -0000 Delivered-To: apmail-hadoop-hbase-dev-archive@hadoop.apache.org Received: (qmail 18252 invoked by uid 500); 4 Sep 2008 16:30:34 -0000 Mailing-List: contact hbase-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hbase-dev@hadoop.apache.org Delivered-To: mailing list hbase-dev@hadoop.apache.org Received: (qmail 18241 invoked by uid 99); 4 Sep 2008 16:30:34 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 04 Sep 2008 09:30:34 -0700 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 04 Sep 2008 16:29:44 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id E4817234C1CB for ; Thu, 4 Sep 2008 09:29:45 -0700 (PDT) Message-ID: <1346450245.1220545785931.JavaMail.jira@brutus> Date: Thu, 4 Sep 2008 09:29:45 -0700 (PDT) From: "stack (JIRA)" To: hbase-dev@hadoop.apache.org Subject: [jira] Commented: (HBASE-867) If millions of columns in a column family, hbase scanner won't come up In-Reply-To: <1921396865.1220486866078.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HBASE-867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12628374#action_12628374 ] stack commented on HBASE-867: ----------------------------- To be clear, if thousands of columns plus -- i.e. a canonical usage -- hbase does not work. Here is some of the problem code form StoreFileScanner#next: {code} ... while ((keys[i] != null) && (Bytes.compareTo(keys[i].getRow(), viableRow.getRow()) == 0)) { // If we are doing a wild card match or there are multiple matchers // per column, we need to scan all the older versions of this row // to pick up the rest of the family members if(!isWildcardScanner() && !isMultipleMatchScanner() && (keys[i].getTimestamp() != viableRow.getTimestamp())) { break; } if (columnMatch(i)) { // We only want the first result for any specific family member if(!results.containsKey(keys[i].getColumn())) { results.put(keys[i].getColumn(), new Cell(vals[i], keys[i].getTimestamp())); insertedItem = true; } } else { // Content is sorted. If column no longer matches, break. break; } if (!getNext(i)) { closeSubScanner(i); } } // Advance the current scanner beyond the chosen row, to // a valid timestamp, so we're ready next time. while ((keys[i] != null) && ((Bytes.compareTo(keys[i].getRow(), viableRow.getRow()) <= 0) || (keys[i].getTimestamp() > this.timestamp) || (! columnMatch(i)))) { getNext(i); } .. {code} The whiles find next row by getting cells until the row does not match. If many columns per row, then that can take for ever (as its doing in Daniel's case). Need to have a file format that has an index that says where next row is. An option would say whether to get to next row by nexting or instead asking index. > If millions of columns in a column family, hbase scanner won't come up > ---------------------------------------------------------------------- > > Key: HBASE-867 > URL: https://issues.apache.org/jira/browse/HBASE-867 > Project: Hadoop HBase > Issue Type: Bug > Reporter: stack > > Our Daniel has uploaded a table that has a column family with millions of columns in it. He can get items from the table promptly specifying row and column. Scanning is another matter. Thread dumping I see we're stuck in the scanner constructor nexting through cells. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.