incubator-kafka-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From S Ahmed <sahmed1...@gmail.com>
Subject Re: understanding partitions based on wiki example of profile visits
Date Tue, 27 Nov 2012 20:00:19 GMT
How does that work out though, I mean with 10 million users that is 10
million  files at least.


On Mon, Nov 26, 2012 at 2:02 PM, Jay Kreps <jay.kreps@gmail.com> wrote:

> Yeah a partition is physically implemented as a log (i.e. a sequence of
> files containing a bunch of messages indexed by offset). So each server can
> have lots of partitions, but each partition exists entirely on a server.
>
> So in the "newsfeed" case if you partition by user id, you would be
> guaranteed that all activity relevant to that user went to a single
> processor. In our case, yes, we serve out of a different system which is
> the destination after all the pre-processing.
>
>
> On Mon, Nov 26, 2012 at 9:19 AM, S Ahmed <sahmed1020@gmail.com> wrote:
>
> > >Yes, your description is correct. A particular member's data would all
> be
> > >in one partition.
> > When you say in one partition, that also means on the same server?  Or a
> > partition can span a brocker node?
> >
> > At the file level, I'm guessing it has its own physical file then? (or
> set
> > of files as it grows with the file number suffix).
> >
> > So at linkedIn, is this how you present a users dashboard inbox (your
> > friend has a new job, they updated their profile, someone recommended
> them,
> > etc.)   I guess you can further sort at the application level then, and
> > cache to a different store?
> >
> >
> > On Mon, Nov 26, 2012 at 11:53 AM, Jay Kreps <jay.kreps@gmail.com> wrote:
> >
> > > Yes, your description is correct. A particular member's data would all
> be
> > > in one partition.
> > >
> > > Broker partitions are just the unit of parallelism--think of each
> > partition
> > > as a totally ordered log you can append to and read from. The
> consumption
> > > of one of these partition logs is single threaded.
> > >
> > > The guarantee is that all messages are added to a partition in the
> order
> > > they arrive. From the point of view of a single producer client this
> will
> > > also be the order in which they are sent. These messages are then
> > delivered
> > > in this order to a consumer thread.
> > >
> > > Hope that helps.
> > >
> > > -Jay
> > >
> > >
> > >
> > >
> > > On Sun, Nov 25, 2012 at 7:54 PM, S Ahmed <sahmed1020@gmail.com> wrote:
> > >
> > > > The wiki states "Consider an application that would like to maintain
> an
> > > > aggregation of the number of profile visitors for each member. It
> would
> > > > like to send all profile visit events for a member to a particular
> > > > partition and, hence, have all updates for a member to appear in the
> > same
> > > > stream for the same consumer thread." (
> > > > http://incubator.apache.org/kafka/design.html)
> > > >
> > > > So say I have 5 broker servers, now my producer will send a message
> > for a
> > > > particular profile page visit, with the default algorithm using
> > > > hash(member_id)%num_partitions
> > > > to figur out which broker server to send it it.
> > > >
> > > > So a particular members pageview messages will all go to a single
> > server
> > > > then, is this the case?  And therefore all the messages for a given
> > user
> > > > will be in the correct order also right?
> > > >
> > > > So a consumer group that subscribes to the 'profile-page-view' topic
> > will
> > > > consume page view related messages, is it possible to subscribe to a
> > > > particular broker partition also?
> > > >
> > > > Are broker partitions meant for cases when you want all messages to
> be
> > > > saved on the same node?
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message