asterixdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Xikui Wang <xik...@uci.edu>
Subject Re: Parallel feed ingestion
Date Wed, 17 May 2017 20:25:30 GMT
Hi,

Firstly, 3) won't work well as the socket server inside of AsterixDB takes
connection
from client side one at a time. The thing you will observe while having two
clients sending
data to one socket simultaneously is, the 1st client will go through and
the 2nd will be
blocked after several hundreds records. This will continue until the 1st
one finishes.

The comparison between 1) and 2) is interesting. (@Abdullah please correct
me if I'm wrong.)
IMO, 1) achieves parallelism at the operator level by having intake
operator
running on designated nodes simultaneously. 2) achieves that at job level
by simply
putting up several jobs which run independently. I think 1) may have less
overhead
compared to 2), since part of the workflow that can be shared is duplicated
multiple times in 2).
It would be useful to see how these two performs in saturated conditions.

Best,
Xikui

On Wed, May 17, 2017 at 12:11 PM, Mike Carey <dtabass@gmail.com> wrote:

> @Xikui?  @Abdullah?
>
>
>
> On 5/17/17 11:40 AM, Ildar Absalyamov wrote:
>
>> In light of Steven’s discussion about feeds in parallel thread I was
>> wondering what would be a correct way to push parallel ingestion as far as
>> possible in multinode\multipartition environment.
>> In one of my experiments I am trying to saturate the ingestion to see the
>> effect of computing stats in background.
>> Several things I’ve tried:
>> 1) Open a socket adapter on all NC:
>> create feed Feed using socket_adapter
>> (
>>      ("sockets”="NC1:10001,NC2:10001,…”),
>> …)
>>
>> 2) Connect several Feeds to a single dataset.
>> create feed Feed1 using socket_adapter
>> (
>>      ("sockets”="NC1:10001”),
>> …)
>> create feed Feed2 using socket_adapter
>> (
>>      ("sockets”="NC2:10001”),
>> …)
>>
>> 3) Have several nodes sending data into a single socket.
>>
>> In my previous experiments the parallelization did not quite show that
>> the bottleneck was on the sender side, but I am wondering if that will
>> still be the case, since a lot of things happened under the hood since the
>> last time.
>>
>> Best regards,
>> Ildar
>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message