hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jon Stewart <...@lightboxtechnologies.com>
Subject One map task to two HFiles
Date Tue, 24 May 2011 21:26:08 GMT
I have a map task that's extracting documents from a flat file and
writing them into an HBase table as individual records; the key is
based off the path of the file (idempotent) but balances key-space
distribution with locality of reference. Additionally, I have a
secondary table where the key is the hash of a file's contents (e.g.,
MD5), and indexes back into the primary table (along with other data).
Rows aren't subject to deletion, which makes life easy.

I've successfully used HFileOutputFormat and KeyValueSortReducer on a
related task that prepopulates data into the secondary table and this
works great. I'd like to convert my extraction task over to writing
HFiles out in bulk, for both tables.

I have enough control over the keys for the primary table that the map
task could write rows to the primary table in order, making it
map-side only (assuming one HFile per task). The map task could then
emit KeyValue objects for the secondary hash table and let
HFileOutpuFormat/KeyValueSortReducer do its thing.

The question is, how do I write an HFile from a map task?
HFile.Writer? What are the gotchas?

Thanks in advance,

Jon Stewart, Principal
(646) 719-0317 | jon@lightboxtechnologies.com | Arlington, VA

View raw message