hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rohit Kelkar <rohitkel...@gmail.com>
Subject running MR job and puts on the same table
Date Sat, 22 Jun 2013 16:42:52 GMT
I have a usecase where I push data in my HTable in waves followed by
Mapper-only processing. Currently once a row is processed in map I
immediately mark it as processed=true. For this inside the map I execute a
table.put(isprocessed=true). I am not sure if modifying the table like this
is a good idea. I am also concerned that I am modifying the same table that
I am running the MR job on.
So I am thinking of another approach where I accumulate the processed rows
in a list (or a better compact data structure) and use the cleanup method
of the MR job to execute all the table.put(isprocessed=true) at once.
What is the suggested best practice?

- R

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message