hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "stack (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HBASE-1512) Coprocessors: Support aggregate functions
Date Tue, 15 Mar 2011 04:52:30 GMT

    [ https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13006793#comment-13006793
] 

stack commented on HBASE-1512:
------------------------------

FYI, lines should be 80 characters or less.

What you want here Himanshu?

{code}
+    // sleep here is an ugly hack to allow region transitions to finish
+    Thread.sleep(5000);
+    for (JVMClusterUtil.RegionServerThread t : cluster.getRegionServerThreads()) {
+      for (HRegionInfo r : t.getRegionServer().getOnlineRegions()) {
+        t.getRegionServer().getOnlineRegion(r.getRegionName()).
+          getCoprocessorHost().
+          load(AggregateProtocolImpl.class, //TestAggFunctions.AggFunctionsCT.class,
+              Coprocessor.Priority.USER);
+        
+      }
+    }
{code}

I think there is probably better means of waiting on HRS startup than a timer (FYI, a delay
will always fail up on the Apache build server -- it has a special knack for doing the unexpected).

Our convention is spaces around operators.  Not...

{code}
+    for(int i=0;i<n;i++) {
{code}

... but

{code}
+    for(int i = 0; i < n; i++) {
{code}

See the rest of the code base for examples.  These are not biggies.  I'm just pointing them
out.

This CP is very cool.

Would suggest you add more examples to your test. You just test one instance of each aggregate.
 Shove in some edge cases if you can think of them (Debugging a unit tests is a million times
better than trying to debug it out on live cluster).

Do you think you should be a big more defensive here when scanning?  You assume a long:

{code}
+        temp = Bytes.toLong(val.getValue());
{code}

Do you think you should check the cell length?  If > or < long length, then something
is off and WARN?

This log message is going to drive folks crazy:

{code}
+        log.debug("val read in the region is: "+temp);
{code}

In any region of any decent size, there'll almost be a log line per cell?

The below should be inside a finally:

{code}
+      scanner.close();
{code}

Just throw I'd say, don't bother logging:

{code}
+      log.error("Some error occurred. Aborting the computation"+ ie.getMessage());
+      throw new IOException("Aborting the Max aggregate computation");
{code}

Be careful w/ your formatting:

{code}
+    }while(done);
+    scanner.close();
+    }catch (IOException ie){ 
{code}

Try to be consistent.

Here is another example:

{code}
+   /**
+    * For a given column family and column Qualifier for a table, it gives its sum of all
its values at the region level.
+    */
+  
+  @Override
{code}

Why a Long object instead of just a long primitive type?

{code}
+    Long sum = 0l;
{code}

This is messy here formatting-wise:

{code}
+      KeyValue val = results.get(0);
+      if(val != null) counter++; // TODO: Or shall it only caters to the row, and ignore
whether a specific column is null or not. 
+                                    // Or is it like a val can't be null. Need to look in
to all possible values of keyval!
+    }while(done);
+    }finally{
+      scanner.close();  
+    }

{code}

Do you think we'll always be acting on a column only?  What if someone wants to act on a whole
column family; i.e. no qualifier?


Be careful w/ your white space.  For instance in the interface you have this:

{code}
+  List<LongWritable> getAvg(byte[] colFamily, byte[] colQualifier) throws IOException;
+  List<LongWritable> getStd(byte[] colFamily, byte[] colQualifier) throws IOException;
+  
+  
+ 
+}
{code}

Clear those empty lines.

You should put the javadoc in the Interface on the Interface methods, rather than out in the
implementations.  Thats how its usually done.  The implementations inherit the interface javadoc.

Fill out the javadoc in your client, in particular description of the return in each case.

So is stuff being averaged, and summed in the client?  Hows that done?

Patch looks cool.

> Coprocessors: Support aggregate functions
> -----------------------------------------
>
>                 Key: HBASE-1512
>                 URL: https://issues.apache.org/jira/browse/HBASE-1512
>             Project: HBase
>          Issue Type: Sub-task
>          Components: coprocessors
>            Reporter: stack
>         Attachments: 1512.zip, patch-1512.txt
>
>
> Chatting with jgray and holstad at the kitchen table about counts, sums, and other aggregating
facility, facility generally where you want to calculate some meta info on your table, it
seems like it wouldn't be too hard making a filter type that could run a function server-side
and return the result ONLY of the aggregation or whatever.
> For example, say you just want to count rows, currently you scan, server returns all
data to client and count is done by client counting up row keys.  A bunch of time and resources
have been wasted returning data that we're not interested in.  With this new filter type,
the counting would be done server-side and then it would make up a new result that was the
count only (kinda like mysql when you ask it to count, it returns a 'table' with a count column
whose value is count of rows).   We could have it so the count was just done per region and
return that.  Or we could maybe make a small change in scanner too so that it aggregated the
per-region counts.  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message