apex-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bhupesh Chawda <bhup...@datatorrent.com>
Subject Re: Adding features to HBase Input Operators in Malhar-contrib
Date Wed, 30 Dec 2015 14:04:17 GMT
Here is the final hierarchy I am considering:

HBaseInputOperator - Takes care of HBaseStore and its connection. Got rid
of HBaseOperatorBase.
    HBaseScanOperator - Takes care of scanning the table in a non-blocking
manner. Exposes operationScan() and getTuple() as before.
        HBasePOJOInputOperator - Implements operationScan() and getTuple()
and outputs a POJO on the output port.

Comments?

-Bhupesh


On Wed, Dec 30, 2015 at 2:52 PM, Bhupesh Chawda <bhupesh@datatorrent.com>
wrote:

> The class HBaseInputOperator seems to be quite old. HBaseStore seems to be
> having all the functionality provided by HBaseInputOperator and even more
> (including Kerberos authentication).
>
> It would be a good idea to avoid the usage of HBaseInputOperator going
> forward and use HBaseStore instead.
>
> I will also work on abstracting out the HBase input functionality in the
> HBaseInputOperator, which can be extended by concrete implementations.
>
> -Bhupesh
>
> On Wed, Dec 23, 2015 at 7:47 PM, Bhupesh Chawda <bhupesh@datatorrent.com>
> wrote:
>
>> Thanks for the inputs.
>> As an input operator, I am targeting just the Scan operation. Get
>> operation may be supported better as a generic operator (like a query
>> operator) which I can take up later.
>>
>> -Bhupesh
>>
>> On Tue, Dec 22, 2015 at 3:48 PM, Mohit Jotwani <mohit@datatorrent.com>
>> wrote:
>>
>>> +1
>>>
>>> Regards,
>>> Mohit
>>>
>>> On Tue, Dec 22, 2015 at 11:21 AM, Chinmay Kolhatkar <
>>> chinmay@datatorrent.com
>>> > wrote:
>>>
>>> > +1 for above.
>>> > I see that there is HbaseGetOperator but but its abstract no concrete
>>> > implementation of this I can find.
>>> > Are you going to implement of that too?
>>> >
>>> > Maybe the concrete implementation of HbaseGetOperator should have this.
>>> >
>>> > Also, I want to mention one thing about scan from my previous
>>> experience of
>>> > Hbase. The Hbase client is synchronous.
>>> > This means when you fire a scan call, until certain number of records
>>> are
>>> > received at client end, the function blocks.
>>> > This causes a lot of problems in the current thread as it might just
>>> get
>>> > blocked for a long period of time.
>>> > Plus, there are always network related latency to add to the problem.
>>> >
>>> > Usually the way to deal with this is to fire scan like queries on a
>>> > separate thread and then consume the results in the main thread.
>>> >
>>> > Please take care of this scenario while implementation of scan
>>> operator.
>>> >
>>> > -Chinmay.
>>> >
>>> >
>>> > ~ Chinmay.
>>> >
>>> > On Tue, Dec 22, 2015 at 11:08 AM, Sandeep Deshmukh <
>>> > sandeep@datatorrent.com>
>>> > wrote:
>>> >
>>> > > +1 for this Bhupesh.
>>> > >
>>> > > Additionally, I would suggest to add support for;
>>> > > 1. Point query
>>> > > 2. Returning any row version
>>> > >
>>> > > The above two are key features of HBase and should be supported.
>>> > >
>>> > > Regards,
>>> > > Sandeep
>>> > >
>>> > > On Fri, Dec 18, 2015 at 4:39 PM, Bhupesh Chawda <
>>> bhupesh@datatorrent.com
>>> > >
>>> > > wrote:
>>> > >
>>> > > > Hi All,
>>> > > >
>>> > > > The current HBasePOJOInputOperator does not allow us to do the
>>> > following:
>>> > > >
>>> > > >    1. Allow us to specify a set of "column family: column" and
>>> fetch
>>> > data
>>> > > >    only for these columns.
>>> > > >    2. Output format is currently a POJO. We need to have other
>>> output
>>> > > >    formats such that "columnFamily:column" representation is
>>> supported.
>>> > > > Map /
>>> > > >    CSV are some of the options.
>>> > > >    3. Allow specifying "end row-key" to stop scanning a table.
>>> > > >    4. No metrics.
>>> > > >
>>> > > > I am planning to add the above functionality to the HBase Input
>>> > > operators.
>>> > > > These features may go into the HBaseScanOperator /
>>> > > HBasePOJOInputOperator.
>>> > > >
>>> > > > Please let me know your comments.
>>> > > >
>>> > > > Thanks.
>>> > > >
>>> > > > Bhupesh
>>> > > >
>>> > >
>>> >
>>>
>>
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message