pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Paul Mazak (JIRA)" <j...@apache.org>
Subject [jira] [Created] (PIG-4663) HBaseStorage could allow a row columns limit to avoid memory or scan timeout issues
Date Wed, 26 Aug 2015 20:12:46 GMT
Paul Mazak created PIG-4663:
-------------------------------

             Summary: HBaseStorage could allow a row columns limit to avoid memory or scan
timeout issues
                 Key: PIG-4663
                 URL: https://issues.apache.org/jira/browse/PIG-4663
             Project: Pig
          Issue Type: Improvement
            Reporter: Paul Mazak


The HBase client Scan API offers a way to setMaxResultsPerColumnFamily.  This number prevents
all the columns from being consumed when scanning a row.  If you have a single row with several
thousand columns on it, Pig will likely fail giving an OutOfMemoryException or ScannerTimeoutException.

The suggestion is to add the option '-maxResultsPerColumnFamily' which can be passed as an
optString parameter in the constructor, which sets this value on the HBase Scan.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message