hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From alakshman <ghostwhoowa...@gmail.com>
Subject Discarding HLog files
Date Sat, 14 Jul 2007 19:49:31 GMT

I had a question about how stuff is being written to the HLogs. Each column
family that makes up a table has its own on disk representation. However
there is only one HLog for all tables. Which means on every write, the
individual HMemcache's for each column family in the row mutation are
updated but the entire row is written to the HLog. 

Now when a column family's HMemcache is flushed a token is written to HLog
indicating that the column family for this table has been flushed ? There
may be other column families which have not yet been flushed. Since we seem
to write the entire rows to the HLog how can one tell that the log file has
only flushed entities w/o a scan of the entire file ? Is the sequential scan
unavoidable to determine if the HLog can be deleted when it is rolled away ?

Please explain.

View this message in context: http://www.nabble.com/Discarding-HLog-files-tf4080078.html#a11596742
Sent from the Hadoop Dev mailing list archive at Nabble.com.

View raw message