hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Himanshu Verma <himanshuverma...@gmail.com>
Subject Optimizing LoadIncrementalHFiles.java
Date Thu, 27 Aug 2015 23:56:09 GMT
Hi,

I was looking at following method:

 public void doBulkLoad(Path hfofDir, final Admin admin, Table table,
>
>       RegionLocator regionLocator) throws TableNotFoundException,
> IOException  {
>



We can optimize following part of this method:

353       ArrayList<String> familyNames = new
> ArrayList<String>(families.size());
>
> 354       for (HColumnDescriptor family : families) {
>
> 355         familyNames.add(family.getNameAsString());
>
> 356       }
>
> 357       ArrayList<String> unmatchedFamilies = new ArrayList<String>();
>
> 358       Iterator<LoadQueueItem> queueIter = queue.iterator();
>
> 359       while (queueIter.hasNext()) {
>
> 360         LoadQueueItem lqi = queueIter.next();
>
> 361         String familyNameInHFile = Bytes.toString(lqi.family);
>
> 362         if (!familyNames.contains(familyNameInHFile)) {
>
> 363         ¦ unmatchedFamilies.add(familyNameInHFile);
>
> 364         }
>
> 365       }
>

line 353 uses ArrayList data structure for familyNames and calls its
"contains" (line 362) method which is O(n). We can instead use HashSet, its
"contains" method is O(1).

It should increase performance in cases having large number of column
families.

This is my first time here, I can make this change if everything looks fine.

Regards,
Himanshu Verma

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message