hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ioannis Konstantinou <ik...@cslab.ntua.gr>
Subject Hbase bulk import for objects with the same rowid and different columnids
Date Sat, 09 Jan 2010 17:09:00 GMT

I am trying to bulk upload content to hbase using the instructions provided at http://hadoop.apache.org/hbase/docs/current/api/org/apache/hadoop/hbase/mapreduce/package-summary.html#package_description:
I have a mapper that reads input and emmits KeyValue objects to be fed in the KeyValueSortReducer.
The mapper emmits a number of KeyValue objects for each row. For the same rowid, the KeyValue
objects have different columnids.
The problem is the following: when these KeyValue objects (that have the same rowid but different
colids in the same column family) reach the reducer, the TreeSet used to sort KeyValues, keeps
only the KeyValue that gets last (it replaces all entries with the last one that reaches the
reducer), as the KeyValue.COMPARATOR compares only the rowid !!!!!

Can I use a different Comparator??? KeyValue objects of the same rowid must be sorted before
writing them in the Hfile, or this does not matter???

Thank you in advance for your time.

Ioannis Konstantinou
Research Associate, Computing Systems Laboratory
National Technical University of Athens
Web: http://www.cslab.ntua.gr/~ikons

View raw message