incubator-blur-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dibyendu Bhattacharya <dibyendu.bhattach...@gmail.com>
Subject Re: new queue capability
Date Mon, 03 Mar 2014 11:26:26 GMT
Hi,

Is there any simpler Client API will come for indexing data from Kafka to
Blur ? As I mentioned below, I played with the Queue API, What I missed
there was to have a partionar logic to channel the Kafka Stream to a
correct Shard. Also Shard failover need to be handled if I need to
implement the queue logic at Shard level.

Regards,
Dibyendu


On Fri, Feb 28, 2014 at 5:32 PM, Dibyendu Bhattacharya <
dibyendu.bhattachary@gmail.com> wrote:

> Hi,
>
> I was just playing with the new QueueReader API, and as Tim pointed out ,
> its very low level . I still tried to implement a KafkaConsumer .
>
> Here is my use case. Let me know if I have approached correctly.
>
> I have a given topic in Kafka, which has 3 Partitions. And in Blur I have
> a table with 2 Shards . I need to index all messages from Kafka Topic to
> Blur Table.
>
>  I have used Kafka ConsumerGroupAPI to consume in parallel in 2 streams (
> from 3 partitions) for indexing into 2 Blur shards. As ConsumerGroup API
> allow me to split any Kafka Topic into N number of streams, I can choose N
> for my target shard count, here it is 2.
>
> For both shards I created two ShardContext and two BlurIndexSimpleWriter.
> ( Is this okay ?)
>
> Now, I modified the BlurIndexSimpleWriter  to get handle to
> the _queueReader object.  I used this _queueReader  to populate the
> respective shards queue taking messages from KafkaStreams.
>
> Here is the TestCase (KafkaReaderTest) , KafkaStreamReader ( which reads
> the Kafka Stream) , and the KafkaQueueReader ( The Q interface for Blur)
>
> Also attached the modified BlurIndexSimpleWriter. Just added
>
>   public QueueReader getQueueReader(){
>
>   return _queueReader;
>   }
>
> With these changes, I am able to read Kafka messages in parallel streams
> and index them into 2 shards. All documents from Kafka getting indexed
> properly. But after TestCases run , I can see two Index Directory for two
> path I created.
>
> Let me know if this approach is correct ? In this code, I have not taken
> care of Shard Failure logic and as Tim pointed out, if that can be
> abstracted form client that will be great.
>
> Regards,
> Dibyendu
>
>
>
>
> On Thu, Feb 27, 2014 at 9:40 PM, Aaron McCurry <amccurry@gmail.com> wrote:
>
>> What if we provide an implementation of the QueueReader concept that does
>> what you are discussing.  That way in more extreme cases when the user is
>> forced into implementing the lower level api (perhaps for performance)
>> they
>> can still do it, but for the normal case the partitioning (and other
>> difficult issues) are handled by the controllers.
>>
>> I could see adding an enqueueMutate call to the controllers that pushes
>> the
>> mutates to the correct buckets for the user.  At the same time we could
>> allow each of the controllers to pull from an external and push the
>> mutates
>> to the correct buckets for the shards.  I could see a couple of different
>> ways of handling this.
>>
>> However I do agree that right now there is too much burden on the user for
>> the 95% case.  We should make this simpler.
>>
>> Aaron
>>
>>
>> On Thu, Feb 27, 2014 at 10:07 AM, Tim Williams <williamstw@gmail.com>
>> wrote:
>>
>> > I've been playing around with the new QueueReader stuff and I'm
>> > starting to believe it's at the wrong level of abstraction - in the
>> > shard context - for a user.
>> >
>> > Between having to know about the BlurPartioner and handling all the
>> > failure nuances, I'm thinking a much friendlier approach would be to
>> > have the client implement a single message pump that Blur take's from
>> > and handles.
>> >
>> > Maybe on startup the Controllers compete for the lead QueueReader
>> > position, create it from the TableContext and run with it?  The user
>> > would still need to deal with  Controller failures but that seems
>> > easier to reason about then shard failures.
>> >
>> > The way it's crafted right now, the user seems burdened with a lot of
>> > the hard problems that Blur otherwise solves.  Obviously, it trades
>> > off a high burden for one of the controllers.
>> >
>> > Thoughts?
>> > --tim
>> >
>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message