hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Hadoop Wiki] Update of "Hbase/FAQ" by JeffHammerbacher
Date Fri, 18 Jun 2010 00:35:39 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.

The "Hbase/FAQ" page has been changed by JeffHammerbacher.


   1. [[#21|How do I add/remove a node?]]
   1. [[#22|Why do servers have start codes?]]
   1. [[#23|What is the maximum recommended cell size?]]
+  1. [[#24|Why can't I iterate through the rows a table in reverse order?]]
  == Answers ==
@@ -229, +230 @@

  A rough rule of thumb, with little empirical validation, is to keep the data in HDFS and
store pointers to the data in HBase if you expect the cell size to be consistently above 10
MB. If you do expect large cell values and you still plan to use HBase for the storage of
cell contents, you'll want to increase the block size and the maximum region size for the
table to keep the index size reasonable and the split frequency acceptable.
+ '''24. <<Anchor(24)>> Why can't I iterate through the rows a table in reverse
+ Because of the way [[http://hbase.apache.org/docs/current/api/org/apache/hadoop/hbase/io/hfile/HFile.html|HFile]]
works: for efficiency, column values are put on disk with the length of the value written
first and then the bytes of the actual value written second. To navigate through these values
in reverse order, these length values would need to be stored twice (at the end as well) or
in a side file. A robust secondary index implementation is the likely solution here to ensure
the primary use case remains fast.

View raw message