hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Yu <yuzhih...@gmail.com>
Subject Re: Optimizing LoadIncrementalHFiles.java
Date Fri, 28 Aug 2015 00:08:50 GMT
At roughly how many column families would this change show performance boost ?

Cheers



> On Aug 27, 2015, at 4:56 PM, Himanshu Verma <himanshuvermadce@gmail.com> wrote:
> 
> Hi,
> 
> I was looking at following method:
> 
> public void doBulkLoad(Path hfofDir, final Admin admin, Table table,
>> 
>>      RegionLocator regionLocator) throws TableNotFoundException,
>> IOException  {
> 
> 
> 
> We can optimize following part of this method:
> 
> 353       ArrayList<String> familyNames = new
>> ArrayList<String>(families.size());
>> 
>> 354       for (HColumnDescriptor family : families) {
>> 
>> 355         familyNames.add(family.getNameAsString());
>> 
>> 356       }
>> 
>> 357       ArrayList<String> unmatchedFamilies = new ArrayList<String>();
>> 
>> 358       Iterator<LoadQueueItem> queueIter = queue.iterator();
>> 
>> 359       while (queueIter.hasNext()) {
>> 
>> 360         LoadQueueItem lqi = queueIter.next();
>> 
>> 361         String familyNameInHFile = Bytes.toString(lqi.family);
>> 
>> 362         if (!familyNames.contains(familyNameInHFile)) {
>> 
>> 363         ¦ unmatchedFamilies.add(familyNameInHFile);
>> 
>> 364         }
>> 
>> 365       }
> 
> line 353 uses ArrayList data structure for familyNames and calls its
> "contains" (line 362) method which is O(n). We can instead use HashSet, its
> "contains" method is O(1).
> 
> It should increase performance in cases having large number of column
> families.
> 
> This is my first time here, I can make this change if everything looks fine.
> 
> Regards,
> Himanshu Verma

Mime
View raw message