hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andrew Purtell (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (HBASE-6768) HBase Rest server crashes if client tries to retrieve data size > 5 MB
Date Fri, 04 Jan 2013 01:27:11 GMT

    [ https://issues.apache.org/jira/browse/HBASE-6768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13543499#comment-13543499
] 

Andrew Purtell edited comment on HBASE-6768 at 1/4/13 1:25 AM:
---------------------------------------------------------------

bq. With many concurrent curl commands, the REST server will OOM very likely. In this case,
I don't think there are much we can do, right?

Yeah, REST has limits. To avoid the cost of HTTP overheads in cases of scanning or multigets,
REST has been designed to build the response -- issuing multiple RPCs to the HBase cluster
to do so -- and then send the response back to the client all in one HTTP transaction. If
a client request produces a really big response, it has to fit in heap on the REST gateway.
REST scanners do their own batching to handle large datasets in chunks. For a given row request
however we can't divide it up, or request byte sub-ranges of values from the RegionServers.
First the Result (a row) is assembled from the Get or Scan results inside the HBase client
library. Then REST builds a model from the Result, which ends up copying all of the data,
because REST came before KV, so its representation (Cell) predates it. Then the model is sent
out by Jersey/Jetty. The Result becomes a candidate for GC as soon as the model is finished.
The model is a candidate for GC as soon as Jersey finishes the request. 

Edit: So if the REST URL is a request for an entire row, the data in the row must fit in heap
(x ~2). If the REST URL is a request for a large cell (100s of MB), likewise. Multiply by
concurrent connections expected. Like with HBase in general, storing large values should be
avoided. Put big blobs in HDFS. As far as I know the Thrift gateway operates similarly. For
large rows or large cells, direct cluster access via Java API is the best option.

There are probably some clever things we can do to reduce copying, especially if we also consider
changing the client library at the same time, but to date this hasn't been urgent enough to
try.
                
      was (Author: apurtell):
    bq. With many concurrent curl commands, the REST server will OOM very likely. In this
case, I don't think there are much we can do, right?

Yeah, REST has limits. To avoid the cost of HTTP overheads in cases of scanning or multigets,
REST has been designed to build the response -- issuing multiple RPCs to the HBase cluster
to do so -- and then send the response back to the client all in one HTTP transaction. If
a client request produces a really big response, it has to fit in heap on the REST gateway.
REST scanners do their own batching to handle large datasets in chunks. For a given row request
however we can't divide it up, or request byte sub-ranges of values from the RegionServers.
First the Result (a row) is assembled from the Get or Scan results inside the HBase client
library. Then REST builds a model from the Result, which ends up copying all of the data,
because REST came before KV, so its representation (Cell) predates it. Then the model is sent
out by Jersey/Jetty. The Result becomes a candidate for GC as soon as the model is finished.
The model is a candidate for GC as soon as Jersey finishes the request. 
                  
> HBase Rest server crashes if client tries to retrieve data size > 5 MB
> ----------------------------------------------------------------------
>
>                 Key: HBASE-6768
>                 URL: https://issues.apache.org/jira/browse/HBASE-6768
>             Project: HBase
>          Issue Type: Bug
>          Components: REST
>    Affects Versions: 0.90.5
>            Reporter: Mubarak Seyed
>            Assignee: Jimmy Xiang
>              Labels: noob
>
> I have a CF with one qualifier, data size is > 5 MB, when i try to read the raw binary
data as octet-stream using curl, rest server got crashed and curl throws exception as
> {code}
>  curl -v -H "Accept: application/octet-stream" http://abcdefgh-hbase003.test1.test.com:9090/table1/row_key1/cf:qualifer1
> /tmp/out
> * About to connect() to abcdefgh-hbase003.test1.test.com port 9090
> *   Trying xx.xx.xx.xxx... connected
> * Connected to abcdefgh-hbase003.test1.test.com (xx.xxx.xx.xxx) port 9090
> > GET /table1/row_key1/cf:qualifer1 HTTP/1.1
> > User-Agent: curl/7.15.5 (x86_64-redhat-linux-gnu) libcurl/7.15.5 OpenSSL/0.9.8b
zlib/1.2.3 libidn/0.6.5
> > Host: abcdefgh-hbase003.test1.test.com:9090
> > Accept: application/octet-stream
> > 
>   % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
>                                  Dload  Upload   Total   Spent    Left  Speed
>   0     0    0     0    0     0      0      0 --:--:--  0:00:02 --:--:--     0< HTTP/1.1
200 OK
> < Content-Length: 5129836
> < X-Timestamp: 1347338813129
> < Content-Type: application/octet-stream
>   0 5009k    0 16272    0     0   7460      0  0:11:27  0:00:02  0:11:25 13872transfer
closed with 1148524 bytes remaining to read
>  77 5009k   77 3888k    0     0  1765k      0  0:00:02  0:00:02 --:--:-- 3253k* Closing
connection #0
> curl: (18) transfer closed with 1148524 bytes remaining to read
> {code}
> Couldn't find the exception in rest server log or no core dump either. This issue is
constantly reproducible. Even i tried with HBase Rest client (HRemoteTable) and i could recreate
this issue if the data size is > 10 MB (even with MIME_PROTOBUF accept header)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message