Return-Path: X-Original-To: apmail-hbase-user-archive@www.apache.org Delivered-To: apmail-hbase-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 5064EF206 for ; Thu, 21 Mar 2013 20:29:29 +0000 (UTC) Received: (qmail 14807 invoked by uid 500); 21 Mar 2013 20:29:27 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 14764 invoked by uid 500); 21 Mar 2013 20:29:27 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 14748 invoked by uid 99); 21 Mar 2013 20:29:27 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 21 Mar 2013 20:29:27 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of enis.soz@gmail.com designates 209.85.128.52 as permitted sender) Received: from [209.85.128.52] (HELO mail-qe0-f52.google.com) (209.85.128.52) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 21 Mar 2013 20:29:22 +0000 Received: by mail-qe0-f52.google.com with SMTP id jy17so404226qeb.11 for ; Thu, 21 Mar 2013 13:29:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=x-received:mime-version:in-reply-to:references:from:date:message-id :subject:to:content-type; bh=4p5j6yHZA8G38n/0xcCYlkwTudDPvU/wy+pZiaxZwXY=; b=Pz2YPXh2QsKi4OpUhmiZvgScANwKRLVs/Lorqnz3g9Q7Thy2XKInPojmUihmm8uhkt t6YYtj3XH+rVfWHSZ251ITesG+PjP7xBSOxYMor1fLmIpA3wT2c5d4hzmd1uWvNW1aok V0Wv/SqxwXbdAuQYgijY5q/xxQN9SkDpOxUAHK/7+LxWvla6eT5CjJZK020LFK7g/20h EJumxE/zIIoJeOE5LR081bBq+z1CDH5L2hDPyqkf8KVIJ4At+bIPfmMQ/xgn1wtwZWid +t986BT2mlzkCOCj3JbgHdGRTq+Eo9sTG/D+g3gVE5phlx5MiyVRSpbP2GAUqz8Q5Z+n SzAg== X-Received: by 10.224.191.68 with SMTP id dl4mr10952474qab.85.1363897741315; Thu, 21 Mar 2013 13:29:01 -0700 (PDT) MIME-Version: 1.0 Received: by 10.49.94.103 with HTTP; Thu, 21 Mar 2013 13:28:41 -0700 (PDT) In-Reply-To: References: From: =?UTF-8?Q?Enis_S=C3=B6ztutar?= Date: Thu, 21 Mar 2013 13:28:41 -0700 Message-ID: Subject: Re: Does HBase RegionServer benefit from OS Page Cache To: hbase-user Content-Type: multipart/alternative; boundary=20cf3005dc3861a04504d8753066 X-Virus-Checked: Checked by ClamAV on apache.org --20cf3005dc3861a04504d8753066 Content-Type: text/plain; charset=UTF-8 I think the page cache is not totally useless, but as long as you can control the GC, you should prefer the block cache. Some of the reasons of the top of my head: - In case of a cache hit, for OS cache, you have to go through the DN layer (an RPC if ssr disabled), and do a kernel jump, and read using the read() libc vs for reading a block from the block cache, only the HBase process is involved. There is no process switch involved and no kernel jumps. - The read access path is optimized per hfile block. FS page boundaries and hfile block boundaries are not aligned at all. - There is very little control to the page cache to cache / not cache based on expected access patterns. For example, we can mark META region blocks, and some column families, and hfile index blocks always cached or cached with high priority. Also, for full table scans, we can explicitly disable block caching to not trash the current working set. With OS page cache, you do not have this control. Enis On Wed, Mar 20, 2013 at 10:30 AM, Jean-Daniel Cryans wrote: > First, MSLAB has been enabled by default since 0.92.0 as it was deemed > stable enough. So, unless you are on 0.90, you are already using it. > > Also, I'm not sure why you are referencing the HLog in your first > paragraph in the context of reading from disk, because the HLogs are > rarely read (only on recovery). Maybe you meant HFile? > > In any case, your email covers most arguments except for one: > checksumming. Retrieving a block from HDFS, even when using short > circuit reads to go directly to the OS instead of passing through the > DN, will take quite a bit more time than reading directly from the > block cache. This is why even if you disable block caching on a family > that the index and root blocks will still be block cached, as reading > those very hot blocks from disk would take way too long. > > Regarding your main question (how does the OS buffer help?), I don't > have a good answer. It kind of depends on the amount of RAM you have > and what your workload is like. As a data point, I've been successful > running with 24GB of heap (50% dedicated to the block cache) with a > workload consisting mainly of small writes, short scans, and a typical > random read distribution for a website. I can't remember the last time > I saw a full GC and it's been running for more than a year like this. > > Hope this somehow helps, > > J-D > > On Wed, Mar 20, 2013 at 12:34 AM, Pankaj Gupta > wrote: > > Given that HBase has it's own cache (block cache and bloom filters) and > that all the table data is stored in HDFS, I'm wondering if HBase benefits > from OS page cache at all. In the set up I'm using HBase Region Servers run > on the same boxes as the HDFS data node. In such a scenario if the > underlying HLog files lives on the same machine then having a healthy > memory surplus may mean that the data node can serve underlying file from > page cache and thus improving HBase performance. Is this really the case? > (I guess page cache should also help in case where HLog file lives on a > different machine but in that case network I/O will probably drown the > speedup achieved due to not hitting the disk. > > > > I'm asking because if page cache were useful then in an HBase set up not > utilizing all the memory on the machine for the region server may not be > that bad. The reason one would not want to use all the memory for region > server would be long garbage collection pauses that large heap size may > induce. I understand that work has been done to fix the long pauses caused > due to memory fragmentation in the old generation, mostly concurrent > garbage collector by using slab cache allocator for memstore but that > feature is marked experimental and we're not ready to take risks yet. So if > the page cache was useful in any way on Region Servers we could go with > less memory for RegionServer process with the understanding that free > memory on the machine is not completely going to waste. Thus my curiosity > about utility of os page cache to performance of HBase. > > > > Thanks in Advance, > > Pankaj > --20cf3005dc3861a04504d8753066--