hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Purtell <apurt...@apache.org>
Subject Re: Question to speaker (tab file loading) at yesterdays user group
Date Thu, 15 Jan 2009 23:24:54 GMT
> From: tim robertson <timrobertson100@gmail.com>
> > Until compression is super solid, I would be wary of
> > storing text (xml,html, etc) in hbase due to size 
> > concerns.
> Hmmm... Where do the indexing guys store their raw
> harvested records / HTML / whatever then?

Compression is lightly tested. In practice it adds to the
heap charge as extra byte buffers on the heap allocated for
{de,re}compression. I was using compression to archive web
content written to HBase by the Heritrix HBase writer, but
stopped using it after we ran into OOME issues at compaction.
The root cause of this was not directly related to 
compression and Stack worked up a fix for 0.19 for that 
cause. I may be ready to try compression again soon. 

For us, disk is cheap and we have ~20TB of effective HDFS
space (after subtracting for replication factor) to back 
our HBase tables. Furthermore we use TTLs to expire content
after a certain period of time because it is no longer of
interest then (too out of date). One could use a mapreduce
task to accomplish the same with deletes -- also triggering/
scheduling recrawing as needed/wanted. 

Anyway, I think what people are saying is just that 
compression's use has been relatively rare on the clusters
where HBase has been mostly commonly under test. Something
to be aware of. Actually your use of it would be valuable
experience for the whole community.

   - Andy


View raw message