hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Purtell <apurt...@apache.org>
Subject Re: Hbase performance with HDFS
Date Thu, 07 Jul 2011 22:12:14 GMT
> 1) When compactions occur on Node A would it also include b2 and b3
> which is actually a redundant copy? My guess is yes.


I don't follow your question.

HDFS files are read by opening an input stream. This stream is fed data from block replicas
chosen at random. One block replica for each block. The reader doesn't see "redundant copies".

> 2) Now compaction occurs and creates HFile3 which as you said is
> replicated. But what happens to HFile1 and HFile2? I am assuming it
> gets deleted.


They are deleted.
 

Best regards,


   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White)


----- Original Message -----
> From: Mohit Anchlia <mohitanchlia@gmail.com>
> To: user@hbase.apache.org
> Cc: 
> Sent: Thursday, July 7, 2011 3:02 PM
> Subject: Re: Hbase performance with HDFS
> 
>T hanks! I understand what you mean however I have little confusion.
> Does it mean there are unused block sitting around? For eg:
> 
> HFile1 with 3 blocks spread accross 3 nodes Node A:(b1),b2,b3 Node
> B:b1,(b2),b3 and Node C:b1,b2,(b3).
> 
> HFile2 with 3 blocks spread accross 3 nodes Node A:(b1),b2,b3 Node
> B:b1,(b2),b3 and Node C:b1,b2,(b3)
> 
> I have 2 questions:
> 
> 1) When compactions occur on Node A would it also include b2 and b3
> which is actually a redundant copy? My guess is yes.
> 2) Now compaction occurs and creates HFile3 which as you said is
> replicated. But what happens to HFile1 and HFile2? I am assuming it
> gets deleted.
> 
> Thanks for everyones patience!
> 
> On Thu, Jul 7, 2011 at 2:43 PM, Buttler, David <buttler1@llnl.gov> wrote:
>>  The nice part of using HDFS as the file system is that the replication is 
> taken care of by the file system.  So, when the compaction finishes, that means 
> the replication has already taken place.
>> 
>>  -----Original Message-----
>>  From: Mohit Anchlia [mailto:mohitanchlia@gmail.com]
>>  Sent: Thursday, July 07, 2011 2:02 PM
>>  To: user@hbase.apache.org; Andrew Purtell
>>  Subject: Re: Hbase performance with HDFS
>> 
>>  Thanks Andrew. Really helpful. I think I have one more question right
>>  now :) Underneath HDFS replicates blocks by default 3. Not sure how it
>>  relates to HFile and compactions. When compaction occurs is it also
>>  happening on the replica blocks from other nodes? If not then how does
>>  it work when one node fails.
>> 
>>  On Thu, Jul 7, 2011 at 1:53 PM, Andrew Purtell <apurtell@apache.org> 
> wrote:
>>>>  You mentioned about compactions, when do those occur and what 
> triggers
>>>>  them?
>>> 
>>>  Compactions are triggered by an algorithm that monitors the number of 
> flush files in a store and the size of them, and is configurable in several 
> dimensions.
>>> 
>>>>  Does it cause additional space usage when that happens
>>> 
>>>  Yes.
>>> 
>>>>  if it
>>>>  does it would mean you always need to have much more disk then you
>>>>  really need.
>>> 
>>> 
>>>  Not all regions are compacted at once. Each region by default is 
> constrained to 256 MB. Not all regions will hold the full amount of data. The 
> result is not a perfect copy (doubling) if some data has been deleted or are 
> associated with TTLs that have expired. The merge sorted result is moved into 
> place and the old files are deleted as soon as the compaction completes. So how 
> much more is "much more"? You can't write to any kind of data 
> store on a (nearly) full volume anyway, no matter HBase/HDFS, or MySQL, or...
>>> 
>>>>  Since HDFS is mostly write once how are updates/deletes handled?
>>> 
>>> 
>>>  Not mostly, only write once.
>>> 
>>>  From the BigTable paper, section 5.3: "A valid read operation is 
> executed on a merged view of the sequence of SSTables and the memtable. Since 
> the SSTables and the memtable are lexicographically sorted data structures, the 
> merged view can be formed efficiently." So what this means is all the store 
> files and the memstore serve effectively as change logs sorted in reverse 
> chronological order.
>>> 
>>>  Deletes are just another write, but one that writes tombstones 
> "covering" data with older timestamps.
>>> 
>>>  When serving queries, HBase searches store files back in time until it 
> finds data at the coordinates requested or a tombstone.
>>> 
>>>  The process of compaction not only merge sorts a bunch of accumulated 
> store files (from flushes) into fewer store files (or one) for read efficiency, 
> it also performs housekeeping, dropping data "covered" by the delete 
> tombstones. Incidentally this is also how TTLs are supported: expired values are 
> dropped as well.
>>> 
>>>  Best regards,
>>> 
>>>     - Andy
>>> 
>>>  Problems worthy of attack prove their worth by hitting back. - Piet 
> Hein (via Tom White)
>>> 
>>> 
>>>> ________________________________
>>>> From: Mohit Anchlia <mohitanchlia@gmail.com>
>>>> To: Andrew Purtell <apurtell@apache.org>
>>>> Cc: "user@hbase.apache.org" <user@hbase.apache.org>
>>>> Sent: Thursday, July 7, 2011 12:30 PM
>>>> Subject: Re: Hbase performance with HDFS
>>>> 
>>>> Thanks that helps! Just few more questions:
>>>> 
>>>> You mentioned about compactions, when do those occur and what 
> triggers
>>>> them? Does it cause additional space usage when that happens, if it
>>>> does it would mean you always need to have much more disk then you
>>>> really need.
>>>> 
>>>> Since HDFS is mostly write once how are updates/deletes handled?
>>>> 
>>>> Is Hbase also suitable for Blobs?
>>>> 
>>>> On Thu, Jul 7, 2011 at 12:11 PM, Andrew Purtell 
> <apurtell@apache.org> wrote:
>>>>>  Some thoughts off the top of my head. Lars' architecture 
> material
>>>>>  might/should cover this too. Pretty sure his book will.
>>>>>  Regarding reads:
>>>>>  One does not have to read a whole HDFS block. You can request 
> arbitrary byte
>>>>>  ranges with the block, via positioned reads. (It is true also 
> that HDFS can
>>>>>  be improved for better random reading performance in ways not 
> necessarily
>>>>>  yet committed to trunk or especially a 0.20.x branch with 
> append support for
>>>>>  HBase. See https://issues.apache.org/jira/browse/HDFS-1323)
>>>>>  HBase holds indexes to store files in HDFS in memory. We also 
> open all store
>>>>>  files at the HDFS layer and stash those references. 
> Additionally, users can
>>>>>  specify the use of bloom filters to improve query time 
> performance through
>>>>>  wholesale skipping of HFile reads if they are known not to 
> contain data that
>>>>>  satisfies the query. Bloom filters are held in memory as well.
>>>>>  So with indexes resident in memory when handling Gets we know 
> the byte
>>>>>  ranges within HDFS block(s) that contain the data of interest. 
> With
>>>>>  positioned reads we retrieve only those bytes from a DataNode. 
> With optional
>>>>>  bloomfilters we avoid whole HFiles entirely.
>>>>>  Regarding writes:
>>>>>  I think you should consult the bigtable paper again if you are 
> still asking
>>>>>  about the write path. The database is log structured. Writes 
> are accumulated
>>>>>  in memory, and flushed all at once. Later flush files are 
> compacted as
>>>>>  needed, because as you point out GFS and HDFS are optimized for 
> streaming
>>>>>  sequential reads and writes.
>>>>> 
>>>>>  Best regards,
>>>>> 
>>>>>    - Andy
>>>>>  Problems worthy of attack prove their worth by hitting back. - 
> Piet Hein
>>>>>  (via Tom White)
>>>>> 
>>>>>  ________________________________
>>>>>  From: Mohit Anchlia <mohitanchlia@gmail.com>
>>>>>  To: user@hbase.apache.org; Andrew Purtell 
> <apurtell@apache.org>
>>>>>  Sent: Thursday, July 7, 2011 11:53 AM
>>>>>  Subject: Re: Hbase performance with HDFS
>>>>> 
>>>>>  I have looked at bigtable and it's ssTables etc. But my 
> question is
>>>>>  directly related to how it's used with HDFS. HDFS 
> recommends large
>>>>>  files, bigger blocks, write once and read many sequential 
> reads. But
>>>>>  accessing small rows and writing small rows is more random and
>>>>>  different than inherent design of HDFS. How do these 2 go 
> together and
>>>>>  is able to provide performance.
>>>>> 
>>>>>  On Thu, Jul 7, 2011 at 11:22 AM, Andrew Purtell 
> <apurtell@apache.org> wrote:
>>>>>>  Hi Mohit,
>>>>>> 
>>>>>>  Start here: http://labs.google.com/papers/bigtable.html
>>>>>> 
>>>>>>  Best regards,
>>>>>> 
>>>>>> 
>>>>>>      - Andy
>>>>>> 
>>>>>>  Problems worthy of attack prove their worth by hitting 
> back. - Piet Hein
>>>>>>  (via Tom White)
>>>>>> 
>>>>>> 
>>>>>>> ________________________________
>>>>>>> From: Mohit Anchlia <mohitanchlia@gmail.com>
>>>>>>> To: user@hbase.apache.org
>>>>>>> Sent: Thursday, July 7, 2011 11:12 AM
>>>>>>> Subject: Hbase performance with HDFS
>>>>>>> 
>>>>>>> I've been trying to understand how Hbase can provide 
> good performance
>>>>>>> using HDFS when purpose of HDFS is sequential large 
> block sizes which
>>>>>>> is inherently different than of Hbase where it's 
> more random and row
>>>>>>> sizes might be very small.
>>>>>>> 
>>>>>>> I am reading this but doesn't answer my question. It 
> does say that
>>>>>>> HFile block size is different but how it really works 
> with HDFS is
>>>>>>> what I am trying to understand.
>>>>>>> 
>>>>>>> http://www.larsgeorge.com/2009/10/hbase-architecture-101-storage.html
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>>> 
>>> 
>> 
>

Mime
View raw message