pulsar-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From GitBox <...@apache.org>
Subject [GitHub] [pulsar] devinbost edited a comment on issue #4012: Adding upsert functionality
Date Wed, 10 Apr 2019 19:37:25 GMT
devinbost edited a comment on issue #4012: Adding upsert functionality
URL: https://github.com/apache/pulsar/pull/4012#issuecomment-481832662
   @jerrypeng Thank you for your very detailed response. I appreciate your time and attention
to this matter. 
   > Is there a reason why you can't just submit/update functions via the REST endpoints
instead of using the pulsar-admin CLI from docker containers? Submitting/Updating functions
by just making a HTTP REST call will be a lot faster . . . 
   I appreciate your guidance. Based on advice from @merlimat earlier today, I am currently
working on an implementation using the REST endpoints.
   > Do you have 300 individual functions or is there a function with 300 instances or
a group of functions that total 300 instances? There will be a huge submission time difference
depending on which scenario. Submitting one function with 300 instances will take much less
time that submitting 300 functions with one instance each.
   At the current moment, all of our functions are individual because they represent different
use cases. However, we appreciate your advice about the performance improvement that we will
get from deploying function instances, so we will examine ways that we can refactor to obtain
those benefits. 
   > What do you mean by this? The cluster will be running as it should when submitting
   I may have been unintentionally misleading, and I apologize for that. Please let me clarify.
When I said:
   >  Pulsar is in a broken state
   I didn't mean that the Pulsar cluster is not running. What I meant is that our end-to-end
production message pipelines will be in a broken state. (i.e. Our customers will experience
   Consider a plumbing analogy. If you need to re-route pipes while water is flowing, if you
can't do it extremely quickly, then water will end up leaking everywhere, and the people who
are expecting water at a particular location will notice a loss of service. This doesn't mean
that the water system is completely broken or that water is not flowing; however, it means
that water is not reaching our customers. 
   In our case, if we have a production data flow that is processing tens of thousands of
messages per second, if we need to deploy updates to functions that are inter-dependent, then
until all of the functions are deployed, some of the functions may introduce breaking changes
that could cause data loss or could cause messages to fail to reach the final destination
topic until all of the updated functions are deployed. 
   Does this make more sense?
   > I think functionality you are looking is bulk create, update, or upserts. You want
to bring a cluster from a potentially unknown state into a known consistent state in regards
to functions. I am I understanding you correctly?
   That is exactly right. 
   I think you're right that we won't likely always need to update all 300 functions every
time we deploy updates. However, we need to ensure that Pulsar can quickly and seamlessly
match the expected state when we deploy updates.
   > While we can add upserts and even bulk upserts. I would suggest you to try just creating/updating
functions directly using the REST endpoint first to see if that is good enough.
   I will investigate your suggestions for implementing these changes for bulk actions. 
   Thank you also for the guidance and example change to ComponentImpl.java for the Upsert
functionality for this PR. 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:

With regards,
Apache Git Services

View raw message