hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jack Chan" <cdj0...@gmail.com>
Subject Re: Re: why HTableDescriptor.getFamiliesKeys is so lag?
Date Thu, 24 Oct 2013 05:47:05 GMT
Hi JM,

Many thanks :)

I'm just curious to know what makes it lag.

Now it's much more clear to me.

Best Regards.

Jack Chan. 

From: Jean-Marc Spaggiari
Date: 2013-10-19 04:46
To: user@hbase.apache.org; cdj0579
Subject: Re: why HTableDescriptor.getFamiliesKeys is so lag?
Hi Jack,

From the code...

// method 1 will call
   * Returns an array all the {@link HColumnDescriptor} of the column families 
   * of the table.
   * @return Array of all the HColumnDescriptors of the current table 
   * @see #getFamilies()
  public HColumnDescriptor[] getColumnFamilies() {
    return getFamilies().toArray(new HColumnDescriptor[0]);

Where getFamilies is return Collections.unmodifiableCollection(this.families.values());

// method 2 will call
   * Returns all the column family names of the current table. The map of 
   * HTableDescriptor contains mapping of family name to HColumnDescriptors. 
   * This returns all the keys of the family map which represents the column 
   * family names of the table. 
   * @return Immutable sorted set of the keys of the families.
  public Set<byte[]> getFamiliesKeys() {
    return Collections.unmodifiableSet(this.families.keySet());

// method 3 will call
   * Returns an unmodifiable collection of all the {@link HColumnDescriptor} 
   * of all the column families of the table.
   * @return Immutable collection of {@link HColumnDescriptor} of all the
   * column families. 
  public Collection<HColumnDescriptor> getFamilies() {
    return Collections.unmodifiableCollection(this.families.values());

So method 1 and 3 are almost the same thing. 1 is a wrapper around 3.

So let's see the difference betwee, 2 and 3. They both do almost the samething, but one arround
keySet() and the otherone around values(). Both of them are calling those mehods on families
which is a TreeMap. So sound like TreeMap.values() is faster than TreeMap.keySet();

Looking into the TreeMap code (and we are no more into HBase here):
    public Collection<V> values() {
        Collection<V> vs = values;
        return (vs != null) ? vs : (values = new Values());

values() will just return the internal values object if it exist (which is most probably the
case), while keySet() will do almost the same thing but has to call another method too:

     * Returns a {@link Set} view of the keys contained in this map.
     * The set's iterator returns the keys in ascending order.
     * The set is backed by the map, so changes to the map are
     * reflected in the set, and vice-versa.  If the map is modified
     * while an iteration over the set is in progress (except through
     * the iterator's own <tt>remove</tt> operation), the results of
     * the iteration are undefined.  The set supports element removal,
     * which removes the corresponding mapping from the map, via the
     * <tt>Iterator.remove</tt>, <tt>Set.remove</tt>,
     * <tt>removeAll</tt>, <tt>retainAll</tt>, and <tt>clear</tt>
     * operations.  It does not support the <tt>add</tt> or <tt>addAll</tt>
     * operations.
    public Set<K> keySet() {
        return navigableKeySet();

     * @since 1.6
    public NavigableSet<K> navigableKeySet() {
        KeySet<K> nks = navigableKeySet;
        return (nks != null) ? nks : (navigableKeySet = new KeySet(this));

So now, 2 options.

1) If you can run each of your method twice, most probably the 2nd time they will all be as
2) the navigableKeySet() call from keySet costs 100ms, which will really surprise me since
I guess the compiler will optimize that.

Last, I'm not sure why those 100ms are important for you, but if they are because you need
to call this method multiple times, then just cache the result on the client side.



Le jeudi 17 octobre 2013, Jack Chan a écrit :

Hi all~
    I need to get all column families from specified table,When I look into the class "org.apache.hadoop.hbase.HTableDescriptor",I
found that
there are more than three methods can be used.
    See the code below,there are method1,method2,method3 to do the same thing:

/*___________code begin___________*/

HTable table = new HTable(config, "mytable");
HTableDescriptor htd = table.getTableDescriptor();
//method 1
TimeCounter tc = new TimeCounter().run();
HColumnDescriptor[] cfs = htd.getColumnFamilies();
for(int i=0;i< cfs.length;i++){
    System.out.println("column family:"+new String(cfs[i].getName()));
System.out.println("time with getColumnFamilies-->"+tc.stop().getMicroSeconds());

TimeCounter tc2 = new TimeCounter().run();
Set<byte[]> family_keys = htd.getFamiliesKeys();
for(byte[] _f :family_keys){
    System.out.println("column family:"+new String(_f));
System.out.println("time with getFamiliesKeys-->"+tc2.stop().getMicroSeconds());

TimeCounter tc3 = new TimeCounter().run();
Collection<HColumnDescriptor> family_co = htd.getFamilies();
for(HColumnDescriptor family_co_entry :family_co){
    System.out.println("column family:"+new String(family_co_entry.getName()));
System.out.println("time with getFamilies-->"+tc3.stop().getMicroSeconds());

/*___________________code end_____________________*/

I found that the efficience of method 1 and method 3 are the same,about 120 us.
but the method2 is lagging,about 500us.

I just need to retieve the column families' names.So method2 is just meet my need.
but why is it so lag?


Jack Chan.
A new Apache-Camel rider.
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message