Dear Wiki user, You have subscribed to a wiki page or wiki category on "Lucene-hadoop Wiki" for change notification. The following page has been changed by JimKellerman: http://wiki.apache.org/lucene-hadoop/Hbase/HbaseArchitecture The comment on the change is: make terminology consistent ------------------------------------------------------------------------------ application desire to implement a ''locality group'' it can do so by simply restricting its map column key set. + We use the terms '''column''' and '''map''' throughout the rest of the document for consistency. + [[Anchor(conceptual)]] == Conceptual View == @@ -61, +63 @@ are located by a row key (and optional timestamp) and where any column may not have a value for a particular row key (sparse). The following example is a slightly modified form of the one on page 2 of the [http://labs.google.com/papers/bigtable.html Bigtable Paper]. + [[Anchor(datamodelexample)]] ||<:|2> '''Row Key''' ||<:|2> '''Time Stamp''' ||<:|2> '''Column''' ''"contents"'' |||| '''Map''' ''"anchor"'' ||<:|2> '''Column''' ''"mime"'' || ||<:> '''key''' ||<:> '''value''' || ||<^|5> "com.cnn.www" ||<:> t9 || ||<)> "cnnsi.com" ||<:> "CNN" || || @@ -77, +80 @@ * Detects the addition and expiration of tablet servers * Balances tablet server load * Garbage collects files (SSTables) in GFS by mark-and-sweep - * Handles schema changes, such as the addition of Column families + * Handles schema changes, such as the addition of Columns and Maps * Keeps track of the set of live tablet servers * Keeps current assignment of tablets to tablet servers, including those that are unassigned * Assigns unassigned tablets to tablet servers with sufficient room @@ -190, +193 @@ * Block index __consists of the start keys for each block__ * Compression * Per block - * ''Column family compression?'' + * Per Map compression * Can be Memory-mapped * Can be shared by two tablets immediately after a split * API @@ -243, +246 @@ I suppose you could represent the maximum row key as the empty string but that would require a special case instead of just a simple compare. - * The "location" column family is in it's own locality group and has the ''InMemory'' tuning parameter set + * The "location" map has the ''InMemory'' tuning parameter set * Each row stores approximately 1KB of data in memory * All events pertaining to each tablet are logged here (such as when a tablet server starts serving a tablet) * ["Schema"] @@ -272, +275 @@ Scanning through a range of key values for a particular column will always be much faster than accessing the values for each column for a given row key. Consequently, values that will be used together should - either be encoded together into a single column value or a column + either be encoded together into a single column value or a map - family should be considered for grouping values. + should be considered for grouping values. - Pictorially, the table in the example above would be stored as + Pictorially, the table shown in the [#datamodelexample data model example] would be stored as follows: - ||<:> '''Row Key''' ||<:> '''Time Stamp''' ||<:> '''Column''' ''"contents:"'' || + ||<:> '''Row Key''' ||<:> '''Time Stamp''' ||<:> '''Column''' ''"contents'' || ||<^|3> "com.cnn.www" ||<:> t6 ||<:> "..." || ||<:> t5 ||<:> `"..."` || ||<:> t3 ||<:> `"..."` || [[BR]] - ||<:|2> '''Row Key''' ||<:|2> '''Time Stamp''' |||| '''Family''' ''"anchor:"'' || + ||<:|2> '''Row Key''' ||<:|2> '''Time Stamp''' |||| '''Map''' ''"anchor"'' || ||<:> '''key''' ||<:> '''value''' || ||<^|2> "com.cnn.www" ||<:> t9 ||<)> "cnnsi.com" ||<:> "CNN" || ||<:> t8 ||<)> "my.look.ca" ||<:> "CNN.com" || [[BR]] - ||<:> '''Row Key''' ||<:> '''Time Stamp''' ||<:> '''Column''' ''"mime:"'' || + ||<:> '''Row Key''' ||<:> '''Time Stamp''' ||<:> '''Column''' ''"mime"'' || || "com.cnn.www" ||<:> t6 ||<:> "text/html" || [[BR]] It is important to note in the diagram above that the empty cells shown in the conceptual view are not stored. Thus a request for the - value of the ''"contents:"'' column at time stamp ''t8'' would return + value of the ''"contents"'' column at time stamp ''t8'' would return - a null value. Similarly, a request for an ''"anchor:"'' value at time + a null value. Similarly, a request for an ''"anchor"'' value at time stamp ''t9'' for "my.look.ca" would return a null value. However, if no timestamp is supplied, the most recent value for a particular column would be returned and would also be the first one found since time stamps are stored in descending order. Consequently - the value returned for ''"contents:"'' if no time stamp is supplied is + the value returned for ''"contents"'' if no time stamp is supplied is - the value for ''t6'' and the value for an ''"anchor:"'' for + the value for ''t6'' and the value for an ''"anchor"'' for "my.look.ca" if no time stamp is supplied is the value for time stamp ''t8''. @@ -329, +332 @@ {{{ CreateTable() - ChangeColumnFamilyMetadata(name=ACL, value=foo) + ChangeColumnMetadata(name=ACL, value=foo) Scanner - FetchColumnFamily + FetchColumnMap Lookup RowName