hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Meil <doug.m...@explorysmedical.com>
Subject Re: how does hbase get the latest version with immutable hfiles?
Date Sat, 02 Jun 2012 13:13:41 GMT

Hi there, I think you probably want to look at thisŠ

Hbase catalog metadataŠ


How data is stored internallyŠ


Lots of versioning description hereŠ


Long story short, client talks directly to RegionServers, Hbase looks at
multiple StoreFiles.

On 6/1/12 4:27 PM, "S Ahmed" <sahmed1020@gmail.com> wrote:

>A row consists of a key, and column families, along with a timestamp.
>So for example:
>key = com.example.com/some/path
>cf: outboundlinks {
>      com.example.com/link1,
>     com.example.com/link2,
>     ..
>Data is stored like this:
>Region Server -> Store -> StoreFile -> HFile
>Now when a client requests a particular key, the hmaster figures out which
>region server holds the data, this information is returned the client
>(which saves it locally), and then it makes a request to the region
>Now since the actual data files are immutable, if you modify a particular
>value in a CF, it is tombestombed (not sure how that works but understand
>it at a high level).
>So if I make a request for a given key, going with the example above, a
>particular url on the website example.com, and i want all the
>I reference the column family "outboudnlinks" which can store millions of
>What process/service/class is in charge of assembling the various files to
>get all the correct data?
>Summary of my question:
>What I am trying to understand is, if a particular CF has millions of
>values, and if a single value is mutated, a new file has to be created.
>this means, if I query for that value i.e. it is included in my result
>how does hbase know where to look for the latest data?
>So basically from what I understand, making a get request for a particular
>key, cf will have to potentially look at more than one StoreFile (or
>HFile?) correct?

View raw message