Return-Path: X-Original-To: apmail-hbase-user-archive@www.apache.org Delivered-To: apmail-hbase-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 4E9919F19 for ; Sun, 3 Jun 2012 19:22:43 +0000 (UTC) Received: (qmail 71596 invoked by uid 500); 3 Jun 2012 19:22:41 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 71546 invoked by uid 500); 3 Jun 2012 19:22:41 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 71536 invoked by uid 99); 3 Jun 2012 19:22:41 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 03 Jun 2012 19:22:41 +0000 X-ASF-Spam-Status: No, hits=1.8 required=5.0 tests=FREEMAIL_ENVFROM_END_DIGIT,FSL_RCVD_USER,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of sahmed1020@gmail.com designates 209.85.216.50 as permitted sender) Received: from [209.85.216.50] (HELO mail-qa0-f50.google.com) (209.85.216.50) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 03 Jun 2012 19:22:35 +0000 Received: by qafl39 with SMTP id l39so1564657qaf.2 for ; Sun, 03 Jun 2012 12:22:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type; bh=bmRvuV1pRDFjMBcfbIaVZWQ1Y4buep8gqdrLPSuu+cM=; b=qK5K54H2sIWjckE7wHTC1MRBHhMxdeyRxnd3fqfZwrf2I4QqDvlEc5TeaU9RV8KnAy inaOsnpV10EKkVCVGrQmJvXLo4DOYAwGBiMgP4AqchsEncEXOXU+dFOo7duf+rUZnBLT N9F/eiZgQ8IAldzNIIpurtGZVhpkTmte7hVfqIP55pLEJ6JwMr9rOscRiplabWq124dj XW6YlTAe/L5p6ByB3lj/6P4toK8panlOdZI9AiSWkFSQZXlSXyUTvn9EJk+BaKK+lrkn Mp0Ug+Lf/8H9Ok775j5FGXEi3Xz/3YmaLNfqP2neX85wGuj9JtkzlaipsIEZSqZkSd6N FTEg== Received: by 10.224.220.204 with SMTP id hz12mr11189408qab.60.1338751334517; Sun, 03 Jun 2012 12:22:14 -0700 (PDT) MIME-Version: 1.0 Received: by 10.229.179.91 with HTTP; Sun, 3 Jun 2012 12:21:54 -0700 (PDT) In-Reply-To: References: From: S Ahmed Date: Sun, 3 Jun 2012 15:21:54 -0400 Message-ID: Subject: Re: how does hbase get the latest version with immutable hfiles? To: user@hbase.apache.org Content-Type: multipart/alternative; boundary=20cf3071d1dcbc8cd304c196555a --20cf3071d1dcbc8cd304c196555a Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Elliot, Is there a video or slides? I guess I have to register to view it? On Sat, Jun 2, 2012 at 2:18 PM, Elliott Clark wrote= : > If you want to get into the really nitty gritty I found Lars' presentatio= n > really insightful. > > http://www.hbasecon.com/sessions/learning-hbase-internals/ > > On Sat, Jun 2, 2012 at 6:13 AM, Doug Meil >wrote: > > > > > Hi there, I think you probably want to look at this=C5=A0 > > > > Hbase catalog metadata=C5=A0 > > > > http://hbase.apache.org/book.html#arch.catalog > > > > How data is stored internally=C5=A0 > > > > http://hbase.apache.org/book.html#regions.arch > > > > Lots of versioning description here=C5=A0 > > > > http://hbase.apache.org/book.html#datamodel > > > > > > > > Long story short, client talks directly to RegionServers, Hbase looks a= t > > multiple StoreFiles. > > > > > > > > On 6/1/12 4:27 PM, "S Ahmed" wrote: > > > > >(reference: > > >http://www.larsgeorge.com/2009/10/hbase-architecture-101-storage.html) > > > > > >A row consists of a key, and column families, along with a timestamp. > > > > > >So for example: > > > > > >key =3D com.example.com/some/path > > > > > >cf: outboundlinks { > > > com.example.com/link1, > > > com.example.com/link2, > > > .. > > >} > > > > > >Data is stored like this: > > > > > >Region Server -> Store -> StoreFile -> HFile > > > > > >Now when a client requests a particular key, the hmaster figures out > which > > >region server holds the data, this information is returned the client > > >(which saves it locally), and then it makes a request to the region > > >server. > > > > > >Now since the actual data files are immutable, if you modify a > particular > > >value in a CF, it is tombestombed (not sure how that works but > understand > > >it at a high level). > > > > > >So if I make a request for a given key, going with the example above, = a > > >particular url on the website example.com, and i want all the > > >outboundlinks > > >I reference the column family "outboudnlinks" which can store millions > of > > >urls. > > > > > >What process/service/class is in charge of assembling the various file= s > to > > >get all the correct data? > > > > > >Summary of my question: > > >What I am trying to understand is, if a particular CF has millions of > > >values, and if a single value is mutated, a new file has to be created= . > > >So > > >this means, if I query for that value i.e. it is included in my result > > >set, > > >how does hbase know where to look for the latest data? > > > > > >So basically from what I understand, making a get request for a > particular > > >key, cf will have to potentially look at more than one StoreFile (or > > >HFile?) correct? > > > > > > > --20cf3071d1dcbc8cd304c196555a--