Return-Path: X-Original-To: apmail-hbase-dev-archive@www.apache.org Delivered-To: apmail-hbase-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 63A1FC833 for ; Tue, 4 Jun 2013 06:10:14 +0000 (UTC) Received: (qmail 90783 invoked by uid 500); 4 Jun 2013 06:10:13 -0000 Delivered-To: apmail-hbase-dev-archive@hbase.apache.org Received: (qmail 90455 invoked by uid 500); 4 Jun 2013 06:10:12 -0000 Mailing-List: contact dev-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hbase.apache.org Delivered-To: mailing list dev@hbase.apache.org Received: (qmail 90443 invoked by uid 99); 4 Jun 2013 06:10:11 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 04 Jun 2013 06:10:11 +0000 X-ASF-Spam-Status: No, hits=1.8 required=5.0 tests=HTML_FONT_FACE_BAD,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS,WEIRD_PORT X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of joarderm@gmail.com designates 209.85.212.44 as permitted sender) Received: from [209.85.212.44] (HELO mail-vb0-f44.google.com) (209.85.212.44) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 04 Jun 2013 06:10:04 +0000 Received: by mail-vb0-f44.google.com with SMTP id 11so63717vbf.3 for ; Mon, 03 Jun 2013 23:09:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:from:date:message-id:subject:to:content-type; bh=Zsv2qrJhfU9vS9xTE3b5EacpwKj+9vIDp7iOutkzzM8=; b=AqolAoj2rWjtSzESTICL4t9M1/HzRjM0bXWKSZ22UBzFTf3T7WhluRK34bVIrfcYTK wyV+Am+NU97AxWTLw+l/wcQWyUL4KE+5fKucDkC+He9aaPnq6a82yPQeRLgklWFhnBkB nUo6Ruj+09JHia3ykcuLRLIFak7BtQRWkAAEX8NULwcCcE2AKIegcw20Mj669t2hU6gE paSuv2xKEbKUvcMRUtcy8oc8Q4aEjYty6k8OBF64t8wkkqIIDoXXSIANRQ5rFQuucAV9 f4eRyoCkNxKJ8ZwNe+K3domaS4kCAvv7qT/nnnqGUJuny4WO3GgxH0SlsAq3AN4DrSAm DGSg== X-Received: by 10.52.16.105 with SMTP id f9mr15167098vdd.101.1370326183683; Mon, 03 Jun 2013 23:09:43 -0700 (PDT) MIME-Version: 1.0 Received: by 10.220.152.141 with HTTP; Mon, 3 Jun 2013 23:09:03 -0700 (PDT) From: Joarder KAMAL Date: Tue, 4 Jun 2013 16:09:03 +1000 Message-ID: Subject: How to collect the real-time transaction request logs from HBase Master/Region Servers? To: dev@hbase.apache.org Content-Type: multipart/alternative; boundary=bcaec5040e0667de9904de4ded7b X-Virus-Checked: Checked by ClamAV on apache.org --bcaec5040e0667de9904de4ded7b Content-Type: text/plain; charset=UTF-8 Dear All, I am a newbie in HBase/Hadoop and recently have a small-scale setup in a research cloud: ------------------------------------------ 1 Master Server (Also Hadoop Name Node) 3 Region Server (Also Hadoop Data Node) 1 Ganglia Monitoring Server 1 YCSB Workload Generation Server ------------------------------------------ HBase Version: 0.94.7, r1471806 Hadoop Version: 1.0.4, r1393290 Ganglia Version: gmond/gmetad - 3.6.0, gweb - 3.5.8 YCSB Version: 0.1.4 ------------------------------------------ I have only one table in HBase - 'usertable' with a single column family 'cf1' holding 1,000,000 key-value records. The row keys are in monotonically increasing order and currently I have 6 regions distributed in the 3 region servers each holding 2 of the regions. * * *Objective:* create region hotspots for some research experiments *Observation:* After running a workload consist of a total 10,000,000 operations (50% read, 50% write) I've observed the below statistics in the Web UI of the master server which can suggest potential hotspots in the 3rd (not sure why !!) and 6th regions (possibly it was receiving large number of write requests). Table Regions NameRegion ServerStart Key End KeyRequests usertable,,1369584948241.3061b90ff519c1bce5b3d867690a2b4a. hdb1-02:60030 user2035146605813492656 127946 usertable,user2035146605813492656,1369584948241.00f8a51bab6d98ebd7c4db582579c3e7. hdb1-03:60030user2035146605813492656user30679275375621809 126700 usertable,user30679275375621809,1369584813037.d704a50802ec39982884e394d4ef05b7. hdb1-04:60030user30679275375621809user5136356049533495298 *284828*usertable,user5136356049533495298,1369584928780.999b987d646462e21b8916a737619b39. hdb1-02:60030 user5136356049533495298user617761656465008158133108usertable,user617761656465008158,1369584928780.9cfe288f48f987de7f93b800dcd4c964. hdb1-04:60030 user617761656465008158user7218407885253116621119008usertable,user7218407885253116621,1369584832152.e3a9c4d35c91f06c18ed346886ff3306. hdb1-03:60030 user7218407885253116621*363234* *Questions:* 1. Can the HBase developer community guide me on how to collect the *raw logs* (directly from the master/region servers) for the above table which I've retrieved from the Master server? 2. And how the master server is getting these logs from the region servers? As far I've understand from the architecture the client will directly communicate with the region servers to read/write the data bypassing the master server (unless the first time or if the region server is not responding) 3. How frequently the master collects these logs? Is it real-time (within 1 sec interval !!)? 4. Which HBase metrics will be most helpful to notice region hotspots from Ganglia? I want to know which transaction request (read/write) going to which region servers from the raw log dumps as like No:12345 ---- Type:Write ---- Query ---- Region06 and so on ... Many thanks again... Regards, Joarder Kamal --bcaec5040e0667de9904de4ded7b--