hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Lucene-hadoop Wiki] Update of "Hbase/HbaseArchitecture" by JimKellerman
Date Sun, 18 Mar 2007 02:44:03 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Lucene-hadoop Wiki" for change notification.

The following page has been changed by JimKellerman:
http://wiki.apache.org/lucene-hadoop/Hbase/HbaseArchitecture

The comment on the change is:
better example of how data is physically stored on disk.

------------------------------------------------------------------------------
  
  === Example ===
  
+ To show how data is stored on disk, consider the folloing example:
+ 
+ A program writes rows "row[0-9]", column "anchor:foo"; then writes
+ rows "row[0-9]"; column "anchor:bar"; and finally writes rows
+ "row[0-9]" column "anchor:foo". After flushing the memcache and
+ compacting the store, the contents of the !MapFile would look like:
- The current unit test for HBase included in the patch on
- [http://issues.apache.org/jira/browse/HADOOP-1045 Hadoop Jira Issue 1045], 
- first writes rows with row id's of the form "row_[0-9]+" where the row
- number goes from 0 to 999. It writes to two column families:
- "contents:basic" and "anchor:anchornum-[0-9]+" (again the range of
- numbers for the anchornum family goes from 0 to 999). It then writes
- rows with row id's of "row_vals_nnn" where nnn is a three digit,
- leading zero filled number from 000 to 999. Two column families are
- written: "contents:firstcol" and anchor:secondcol". After a
- compaction, dumping the
- !MapFile which contains the "anchor:" family we see that the keys,
- displayed as column-family(row-key)/timestamp are ordered as follows:
  
  {{{
+ row=row0, column=anchor:bar, timestamp=1174184619081
+ row=row0, column=anchor:foo, timestamp=1174184620720
+ row=row0, column=anchor:foo, timestamp=1174184617161
+ row=row1, column=anchor:bar, timestamp=1174184619081
+ row=row1, column=anchor:foo, timestamp=1174184620721
+ row=row1, column=anchor:foo, timestamp=1174184617167
+ row=row2, column=anchor:bar, timestamp=1174184619081
+ row=row2, column=anchor:foo, timestamp=1174184620724
+ row=row2, column=anchor:foo, timestamp=1174184617167
+ row=row3, column=anchor:bar, timestamp=1174184619081
+ row=row3, column=anchor:foo, timestamp=1174184620724
+ row=row3, column=anchor:foo, timestamp=1174184617168
+ row=row4, column=anchor:bar, timestamp=1174184619081
+ row=row4, column=anchor:foo, timestamp=1174184620724
+ row=row4, column=anchor:foo, timestamp=1174184617168
+ row=row5, column=anchor:bar, timestamp=1174184619082
+ row=row5, column=anchor:foo, timestamp=1174184620725
+ row=row5, column=anchor:foo, timestamp=1174184617168
+ row=row6, column=anchor:bar, timestamp=1174184619082
+ row=row6, column=anchor:foo, timestamp=1174184620725
+ row=row6, column=anchor:foo, timestamp=1174184617168
+ row=row7, column=anchor:bar, timestamp=1174184619082
+ row=row7, column=anchor:foo, timestamp=1174184620725
+ row=row7, column=anchor:foo, timestamp=1174184617168
+ row=row8, column=anchor:bar, timestamp=1174184619082
+ row=row8, column=anchor:foo, timestamp=1174184620725
+ row=row8, column=anchor:foo, timestamp=1174184617169
+ row=row9, column=anchor:bar, timestamp=1174184619083
+ row=row9, column=anchor:foo, timestamp=1174184620725
+ row=row9, column=anchor:foo, timestamp=1174184617169
- anchor:anchornum-0(row_0)/1174176403717
- anchor:anchornum-1(row_1)/1174176403723
- anchor:anchornum-10(row_10)/1174176403726
- anchor:anchornum-100(row_100)/1174176403769
- anchor:anchornum-101(row_101)/1174176403770
- anchor:anchornum-102(row_102)/1174176403771
- anchor:anchornum-103(row_103)/1174176403771
- anchor:anchornum-104(row_104)/1174176403772
- anchor:anchornum-105(row_105)/1174176403772
- anchor:anchornum-106(row_106)/1174176403773
- anchor:anchornum-107(row_107)/1174176403773
- anchor:anchornum-108(row_108)/1174176403774
- anchor:anchornum-109(row_109)/1174176403774
- anchor:anchornum-11(row_11)/1174176403727
- ...
- anchor:anchornum-99(row_99)/1174176403769
- anchor:anchornum-990(row_990)/1174176403966
- anchor:anchornum-991(row_991)/1174176403966
- anchor:anchornum-992(row_992)/1174176403966
- anchor:anchornum-993(row_993)/1174176403966
- anchor:anchornum-994(row_994)/1174176403966
- anchor:anchornum-995(row_995)/1174176403966
- anchor:anchornum-996(row_996)/1174176403966
- anchor:anchornum-997(row_997)/1174176403966
- anchor:anchornum-998(row_998)/1174176403966
- anchor:anchornum-999(row_999)/1174176403966
- anchor:secondcol(row_vals1_000)/1174176435765
- anchor:secondcol(row_vals1_001)/1174176435766
- anchor:secondcol(row_vals1_002)/1174176435767
- anchor:secondcol(row_vals1_003)/1174176435767
- anchor:secondcol(row_vals1_004)/1174176435767
- anchor:secondcol(row_vals1_005)/1174176435767
- anchor:secondcol(row_vals1_006)/1174176435768
- anchor:secondcol(row_vals1_007)/1174176435768
- anchor:secondcol(row_vals1_008)/1174176435769
- anchor:secondcol(row_vals1_009)/1174176435769
- anchor:secondcol(row_vals1_010)/1174176435770
- ...
  }}}
  
+ Note that column "anchor:foo" is stored twice (because the timestamp
+ differs) and that the most recent timestamp is the first of the two
+ entries. 
- If the row keys had had the same format (say row_nnn), dumping the
- !MapFile we would see:
- 
- {{{
- anchor:anchornum-0(row_000)/1174176403717
- anchor:secondcol(row_000)/1174176435765
- anchor:anchornum-1(row_001)/1174176403723
- anchor:secondcol(row_001)/1174176435766
- ...
- }}}
  
  [[Anchor(hregion)]]
  = HRegion (Tablet) Server =

Mime
View raw message