hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Yu <yuzhih...@gmail.com>
Subject Re: Optimizing LoadIncrementalHFiles.java
Date Fri, 28 Aug 2015 01:58:47 GMT
I looked at the code again.
When number of HFiles to be loaded times number of column families is a big
value, your suggestion may produce some speedup. If you have access to a
cluster, you can measure potential savings in your approach.

Cheers

On Thu, Aug 27, 2015 at 5:08 PM, Ted Yu <yuzhihong@gmail.com> wrote:

> At roughly how many column families would this change show performance
> boost ?
>
> Cheers
>
>
>
> > On Aug 27, 2015, at 4:56 PM, Himanshu Verma <himanshuvermadce@gmail.com>
> wrote:
> >
> > Hi,
> >
> > I was looking at following method:
> >
> > public void doBulkLoad(Path hfofDir, final Admin admin, Table table,
> >>
> >>      RegionLocator regionLocator) throws TableNotFoundException,
> >> IOException  {
> >
> >
> >
> > We can optimize following part of this method:
> >
> > 353       ArrayList<String> familyNames = new
> >> ArrayList<String>(families.size());
> >>
> >> 354       for (HColumnDescriptor family : families) {
> >>
> >> 355         familyNames.add(family.getNameAsString());
> >>
> >> 356       }
> >>
> >> 357       ArrayList<String> unmatchedFamilies = new ArrayList<String>();
> >>
> >> 358       Iterator<LoadQueueItem> queueIter = queue.iterator();
> >>
> >> 359       while (queueIter.hasNext()) {
> >>
> >> 360         LoadQueueItem lqi = queueIter.next();
> >>
> >> 361         String familyNameInHFile = Bytes.toString(lqi.family);
> >>
> >> 362         if (!familyNames.contains(familyNameInHFile)) {
> >>
> >> 363         ¦ unmatchedFamilies.add(familyNameInHFile);
> >>
> >> 364         }
> >>
> >> 365       }
> >
> > line 353 uses ArrayList data structure for familyNames and calls its
> > "contains" (line 362) method which is O(n). We can instead use HashSet,
> its
> > "contains" method is O(1).
> >
> > It should increase performance in cases having large number of column
> > families.
> >
> > This is my first time here, I can make this change if everything looks
> fine.
> >
> > Regards,
> > Himanshu Verma
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message