hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From lars hofhansl <lhofha...@yahoo.com>
Subject Re: HBase Put
Date Tue, 21 Aug 2012 23:07:12 GMT
In a nutshell:
- Puts are collected in memory (in a sorted data structure)
- When the collected data reaches a certain size it is flushed to a new file (which is sorted)
- Gets do a merge sort between the various files that have been created
- to contain the number of files they are periodically compacted into fewer, larger files

So the data files (HFiles) are immutable once written, changes are batched in memory first.

-- Lars

 From: "Pamecha, Abhishek" <apamecha@x.com>
To: "user@hbase.apache.org" <user@hbase.apache.org> 
Sent: Tuesday, August 21, 2012 4:00 PM
Subject: HBase Put

I had a  question on Hbase Put call. In the scenario, where data is inserted without any
order to column qualifiers, how does Hbase maintain sortedness wrt column qualifiers in its
store files/blocks?

I checked the code base and I can see checks<https://github.com/apache/hbase/blob/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV2.java#L319>
being  made for lexicographic insertions for Key value pairs.  But I cant seem to find out
how the
key-offset is calculated in the first place?

Also, given HDFS is by nature, append only, how do randomly ordered keys make their way to
sorted order. Is it only during minor/major compactions, that this sortedness gets applied
and that there is a small window during which data is not sorted?

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message