spark-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From HeartSaVioR <>
Subject [GitHub] spark issue #22138: [SPARK-25151][SS] Apply Apache Commons Pool to KafkaData...
Date Mon, 20 Aug 2018 23:35:43 GMT
Github user HeartSaVioR commented on the issue:
    I'm not sure but are you saying that an executor cares about multiple queries (multiple
jobs) concurrently? I honestly didn't notice it. If that is going to be problem, we should
add something (could we get query id at that time?) in cache key to differentiate consumers.
If we want to avoid extra seeking due to different offsets, consumers should not be reused
among with multiple queries, and that's just a matter of cache key.
    If you are thinking about co-use of consumers among multiple queries because of reusing
connection to Kafka, I think extra seeking is unavoidable (I guess fetched data should be
much more critical issue unless we never reuse after returning to pool). If seeking is light
operation, we may even go with only reusing connection (not position we already sought): always
resetting position (and data maybe?) when borrowing from pool or returning consumer to pool.
    Btw, the rationalization of this patch is not solving the issue you're referring. This
patch is also based on #20767 but dealing with another improvements pointed out in comments:
adopt pool library to not reinvent the wheel, and also enabling metrics regarding the pool.
    I'm not sure the issue you're referring is a serious one (show-stopper): if the issue
is a kind of serious, someone should handle the issue once we are aware of the issue at March,
or at least relevant JIRA issue should be filed with detailed explanation before. I'd like
to ask you in favor of handling (or filing) the issue since you may know the issue best.


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message