hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andrew Purtell (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-10191) Move large arena storage off heap
Date Thu, 20 Feb 2014 01:37:19 GMT

    [ https://issues.apache.org/jira/browse/HBASE-10191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13906456#comment-13906456

Andrew Purtell commented on HBASE-10191:

I'm looking at Netty 4's netty-buffer module (http://netty.io/4.0/api/io/netty/buffer/package-summary.html),
which has some nice properties, including composite buffers, arena allocation, dynamic buffer
resizing, and reference counting, never mind dev and testing by another community. I also
like it because you can plug in your own allocators and specialize the abstract ByteBuf base
type. More on this later.

When I get closer to seeing what exactly needs to be done I will post a design doc. Current
thinking follows. Below the term 'buffer' currently means Netty ByteBufs or derived classes
backed by off-heap allocated direct buffers.


When coming in from RPC, cells are laid out by codecs into cellbocks in buffers and the cellblocks/buffers
are handed to the memstore. Netty's allocation arenas replace the MemstoreLAB. The memstore
data structure evolves into an index over cellblocks.

Per [~mcorgan]'s comment above, we should think about how the memstore index can be built
with fewer object allocations than the number of cells in the memstore, yet be in the ballpark
with efficiency of concurrent access. A tall order. CSLM wouldn't be the right choice as it
allocates at least one list entry per key, but we could punt and use it initially and make
a replacement datastructure as a follow on task.


We feed down buffers to HDFS to fill with file block data. We pick which pool to get a buffer
from for a read depending on family caching strategy. Pools could be backed by arenas that
match up with LRU policy strata, with a common pool/arena for noncaching reads. (Or for noncaching
reads, can we optionally use a new API for getting buffers up from HDFS, perhaps backed by
the pinned shared RAM cache, since we know we will be referring to the contents only briefly?)
It will be important to get reference counting right as we will be servicing scans while attempting
to evict. Related, eviction of a block may not immediately return a buffer to a pool, if there
is more than one block in a buffer.

We maintain new metrics on numbers of buffers allocated, stats on arenas, stats on wastage
and internal fragmentation of the buffers, etc, and use these to guide optimizations and refinements.

> Move large arena storage off heap
> ---------------------------------
>                 Key: HBASE-10191
>                 URL: https://issues.apache.org/jira/browse/HBASE-10191
>             Project: HBase
>          Issue Type: Umbrella
>            Reporter: Andrew Purtell
> Even with the improved G1 GC in Java 7, Java processes that want to address large regions
of memory while also providing low high-percentile latencies continue to be challenged. Fundamentally,
a Java server process that has high data throughput and also tight latency SLAs will be stymied
by the fact that the JVM does not provide a fully concurrent collector. There is simply not
enough throughput to copy data during GC under safepoint (all application threads suspended)
within available time bounds. This is increasingly an issue for HBase users operating under
dual pressures: 1. tight response SLAs, 2. the increasing amount of RAM available in "commodity"
server configurations, because GC load is roughly proportional to heap size.
> We can address this using parallel strategies. We should talk with the Java platform
developer community about the possibility of a fully concurrent collector appearing in OpenJDK
somehow. Set aside the question of if this is too little too late, if one becomes available
the benefit will be immediate though subject to qualification for production, and transparent
in terms of code changes. However in the meantime we need an answer for Java versions already
in production. This requires we move the large arena allocations off heap, those being the
blockcache and memstore. On other JIRAs recently there has been related discussion about combining
the blockcache and memstore (HBASE-9399) and on flushing memstore into blockcache (HBASE-5311),
which is related work. We should build off heap allocation for memstore and blockcache, perhaps
a unified pool for both, and plumb through zero copy direct access to these allocations (via
direct buffers) through the read and write I/O paths. This may require the construction of
classes that provide object views over data contained within direct buffers. This is something
else we could talk with the Java platform developer community about - it could be possible
to provide language level object views over off heap memory, on heap objects could hold references
to objects backed by off heap memory but not vice versa, maybe facilitated by new intrinsics
in Unsafe. Again we need an answer for today also. We should investigate what existing libraries
may be available in this regard. Key will be avoiding marshalling/unmarshalling costs. At
most we should be copying primitives out of the direct buffers to register or stack locations
until finally copying data to construct protobuf Messages. A related issue there is HBASE-9794,
which proposes scatter-gather access to KeyValues when constructing RPC messages. We should
see how far we can get with that and also zero copy construction of protobuf Messages backed
by direct buffer allocations. Some amount of native code may be required.

This message was sent by Atlassian JIRA

View raw message