A question about how some functionality is implemented. It looks like several functions (actually all functions other than the aggregate functions) are implemented along with their data types. For example hour/day extractors seem to be part of SQLTime/SQLTimestamp/etc. data types. LENGTH/RTRIM/LTRIM seem to be part of the string datatype. Isn't there a disconnect...while the aggregate functions are laid out in a hierarchichal, scalar functions are not.
If I wanted a new datatype varcharx, will I have to reimplement all the string related scalar functions?
What if I wanted to build a bridge to use an existing library of (scalar) functions, won't this setup make that more difficult (think various regex libraries)?
Regarding aggregators, it looks like they are implemented by operating on two values at a time (looking at SumAggregator): a value to be added and a value which was populated previously (presumably containing aggregate result of column already traversed).
Again, if I wanted to build a bridge to existing libraries which might operate on vectors rather than a scalar value, I would not be able to do that since most such libraries expect to receive a whole set of values at once (in an array or List form). I'm specifically thinking of the scientific COLT library.
How would I implement a typical 'approximating' function: an AVG which doesn't sum up everything but SUMs x% of the values (randomly selected) and divides by x% of rows? In other words, now I have to pass around another parameter or some sort of context object which contains some extra information. (just an example of a possible problem)
Just trying to make sure the mental model of the code I've studied so far is an accurate one. Thanks.