hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Raghava Mutharaju <m.vijayaragh...@gmail.com>
Subject multiple reads from a Map - optimization question
Date Tue, 22 Jun 2010 07:49:56 GMT
Hello all,

      In the data, I have to check for multiple conditions and then work
with the data that satisfies all the conditions. I am doing this as an MR
job with no reduce and the conditions are translated to a set of filters.
Among the multiple conditions (2 or 3 max), data that satisfies one of them
would come as input to the Map (initial filter is set in the scan to the
mappers). Now, from among the dataset that comes through to each map, I
would check for other conditions (1 or 2 remaining conditions). Since map()
is called for each row of data, it would mean 1 or 2 read calls (with
filter) to HBase tables. This setup, even for small data (data would fit in
a region and so only 1 map is taking in all the data) is very slow.

Here, note that, I shouldn't be filtering the incoming data to map but based
on that data, next set of filtering conditions would be formed.

Can this be improved? Would constructing secondary indexes help (would need
a dramatic improvement actually)? Or is this type of problem not suitable
for HBase?

Thank you.


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message