kafka-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bao Thai Ngo <baothai...@gmail.com>
Subject Re: Kafka is live in prod @ 100%
Date Mon, 05 Mar 2012 07:58:36 GMT
Neha,

Thanks for your response.

Yes, we really need to specify number of partitions differently for
thousand topics. The option num.partitions does a little help in our use
case. Some reasons Taylor and I already explained in the previous emails.
In addition, we currently implement semantic partitioning to maintain
various kinds of produced data. High throughput is needed for important and
quickly-produced/processed kinds of data but not for others.

What do you think?

Thanks,
~Thai

On Sat, Mar 3, 2012 at 6:35 AM, Neha Narkhede <neha.narkhede@gmail.com>wrote:

> Thai,
>
> Do you really need to specify the number of partitions differently for
> so many topics ?
> I wonder if setting the right default for num.partitions works instead ?
>
> Thanks,
> Neha
>
> On Sun, Feb 19, 2012 at 6:47 PM, Bao Thai Ngo <baothaingo@gmail.com>
> wrote:
> > Hi,
> >
> > I like the idea Taylor suggested. This will definitely help a lot.
> >
> > Another approach I would suggest is to let Kafka load information of
> > topic.partition.count.map from an external file (plain-text, xml, ect) in
> > some format like:
> > topic1:#partition
> > topic2:#partition
> > ....
> > topicn:#partition
> >
> > By this way, a Kafka user will be also able to modify (manually or
> > automatically by a script) this information as he/she wants.
> >
> > What do you think?
> >
> > Thanks,
> > ~Thai
> >
> > On Sat, Feb 18, 2012 at 1:52 AM, Taylor Gautier <tgautier@tagged.com>
> wrote:
> >
> >> Jun,
> >>
> >> No, it's necessary for us to modify the tuning parameters on a per topic
> >> basis using wildcards, e.g.
> >>
> >> topic.flush.intervals.ms=chat*:100,presence*:1000
> >>
> >> On Fri, Feb 17, 2012 at 10:09 AM, Jun Rao <junrao@gmail.com> wrote:
> >>
> >> > Taylor,
> >> >
> >> > We don't have a jira for that. Please open one.
> >> >
> >> > In 0.8, we will have DDLs for creating topics, which you can use to
> >> > customize # partitions. Will that be enough?
> >> >
> >> > Jun
> >> >
> >> > On Fri, Feb 17, 2012 at 9:02 AM, Taylor Gautier <tgautier@tagged.com>
> >> > wrote:
> >> >
> >> > > Hi Thai.
> >> > >
> >> > > Well, actually we didn't solve this problem.  We had to use the
> global
> >> > > topic settings that apply to all topics.
> >> > >
> >> > > I would really like to see globs (wildcards) supported in the config
> >> > > settings.  This is something my team and I have discussed on several
> >> > > occasions.
> >> > >
> >> > > I'm not sure if there is a Kafka JIRA to cover that featureā€¦
> >> > >
> >> > > -Taylor
> >> > >
> >> > > On Fri, Feb 17, 2012 at 2:57 AM, Bao Thai Ngo <baothaingo@gmail.com
> >
> >> > > wrote:
> >> > >
> >> > > > Hi Taylor,
> >> > > >
> >> > > > I found your email and the Kafka use case by chance. Our use
case
> is
> >> a
> >> > > > little similar to yours. We actually implement semantic
> partitioning
> >> to
> >> > > > maintain some kind of produced data and we are also running
> several
> >> > > > thousand topics as you.
> >> > > >
> >> > > > One issue we have been facing is that it is totally inconvenient
> for
> >> us
> >> > > to
> >> > > > maintain and update Kafka server configuration (server.properties)
> >> when
> >> > > > running several thousand topics. We have to put number of
> partitions
> >> > on a
> >> > > > per-topic in the way Kafka requires:
> >> > > >
> >> > > > ### Overrides for for the default given by num.partitions on
a
> >> > per-topic
> >> > > > basis
> >> > > > topic.partition.count.map = topic1:4, topic2:4, ..., topicn:4
> >> > > >
> >> > > > I am almost sure that you did meet this issue I have mentioned,
> so I
> >> am
> >> > > > curious to know how you solved it.
> >> > > >
> >> > > > Thanks,
> >> > > > ~Thai
> >> > > >
> >> > > > On Wed, Dec 7, 2011 at 12:34 AM, Taylor Gautier <
> tgautier@tagged.com
> >> > > >wrote:
> >> > > >
> >> > > >> We had to isolate topics to specific servers because we are
> running
> >> > > >> several hundred thousand topics in aggregate.
> >> > > >>
> >> > > >> Due to the directory strategy of Kafka it's not feasible
to put
> that
> >> > > >> many topics in every host since they reside in a single
> directory.
> >> > > >>
> >> > > >> An improvement we considered making was to make the data
> directory
> >> > > >> nested which would have alleviated this problem.  We also
could
> have
> >> > > >> tried a different filesystem but we weren't confident that
would
> >> solve
> >> > > >> the problem entirely.
> >> > > >>
> >> > > >> The advantage to our solution is that each host in our Kafka
> tier is
> >> > > >> literally share nothing. It will scale horizontally for a
long,
> long
> >> > > >> way.
> >> > > >>
> >> > > >> And it's also a contingency plan. Since Kafka was unproven
(for
> us
> >> > > >> anyway at the time) it was easier to build smaller components
> with
> >> > > >> less overall functionality and glue them together in a scalable
> way.
> >> > > >> If we had had to we could have out a different message bus
in
> place.
> >> > > >> But we didn't want to do that if we could avoid it :)
> >> > > >>
> >> > > >>
> >> > > >>
> >> > > >> On Dec 6, 2011, at 9:13 AM, Neha Narkhede <
> neha.narkhede@gmail.com>
> >> > > >> wrote:
> >> > > >>
> >> > > >> > Taylor,
> >> > > >> >
> >> > > >> > This sounds great ! Congratulations on this launch.
> >> > > >> >
> >> > > >> >>> But basically we have many topics, few messages
(relatively)
> per
> >> > > topic
> >> > > >> >
> >> > > >> > Can you explain your strategy of mapping topics to brokers
?
> The
> >> > > >> default in
> >> > > >> > Kafka today is to have all brokers host all topics.
> >> > > >> >
> >> > > >> >>> An end user browser makes a long-poll event
http connection
> to
> >> > > receive
> >> > > >> >  1:1 messages and 1:M messages from a specialized http
server
> we
> >> > built
> >> > > >> for
> >> > > >> >  this purpose.  1:M messages are delivered from Kafka.
> >> > > >> >
> >> > > >> > What do you use for receiving 1:1 messages ?
> >> > > >> >
> >> > > >> > Your use case is interesting and different. It will
be great if
> >> you
> >> > > add
> >> > > >> > relevant details here -
> >> > > >> > https://cwiki.apache.org/confluence/display/KAFKA/Powered+By
> >> > > >> >
> >> > > >> > Thanks,
> >> > > >> > Neha
> >> > > >> >
> >> > > >> >
> >> > > >> > On Tue, Dec 6, 2011 at 8:44 AM, Jun Rao <junrao@gmail.com>
> wrote:
> >> > > >> >
> >> > > >> >> Hi, Taylor,
> >> > > >> >>
> >> > > >> >> Thanks for the update. This is great. Could you
update your
> usage
> >> > in
> >> > > >> Kafka
> >> > > >> >> wiki? Also, do you delete topics online? If so,
how do you do
> >> that?
> >> > > >> >>
> >> > > >> >> Jun
> >> > > >> >>
> >> > > >> >> On Tue, Dec 6, 2011 at 8:30 AM, Taylor Gautier <
> >> > tgautier@tagged.com>
> >> > > >> >> wrote:
> >> > > >> >>
> >> > > >> >>> I've already mentioned this before, but I wanted
to give a
> quick
> >> > > >> shout to
> >> > > >> >>> let you guys know that our newest game, Deckadence,
is 100%
> live
> >> > as
> >> > > of
> >> > > >> >>> yesterday.
> >> > > >> >>>
> >> > > >> >>> Check it out at http://www.tagged.com/deckadence.html
> >> > > >> >>>
> >> > > >> >>> A little about our use case:
> >> > > >> >>>
> >> > > >> >>>  - Deckadence is a game of buying and selling
- or rather
> >> trading
> >> > -
> >> > > >> >>>  cards.  Every user on Tagged owns a card. 
There are 100M
> uses
> >> on
> >> > > >> >> Tagged,
> >> > > >> >>>  so that means there are 100M cards to trade.
> >> > > >> >>>  - Kafka enables real-time delivery of events
in the game
> >> > > >> >>>  - An end user browser makes a long-poll event
http
> connection
> >> to
> >> > > >> >> receive
> >> > > >> >>>  1:1 messages and 1:M messages from a specialized
http
> server we
> >> > > built
> >> > > >> >> for
> >> > > >> >>>  this purpose.  1:M messages are delivered from
Kafka.
> >> > > >> >>>  - Because of this design, we can publish a
message anywhere
> >> > inside
> >> > > >> our
> >> > > >> >>>  datacenter and send it directly and immediately
to any other
> >> > system
> >> > > >> >> that
> >> > > >> >>> is
> >> > > >> >>>  subscribed to Kafka, or to an end-user browser
> >> > > >> >>>  - Every update event for every card is sent
to a unique
> topic
> >> > that
> >> > > >> >>>  represents the users card.
> >> > > >> >>>  - When a user is browsing any card or list
of cards - say a
> >> > search
> >> > > >> >>>  result - their browser subscribes to all of
the cards on
> >> screen.
> >> > > >> >>>  - The effect of this is that any changes to
any card seen
> >> > on-screen
> >> > > >> are
> >> > > >> >>>  seen in real-time by all users of the game
> >> > > >> >>>  - Our primary producers and consumers are PHP
and NodeJS,
> >> > > >> respectively
> >> > > >> >>>
> >> > > >> >>> Well, I plan to write up more about this use
case in the near
> >> > > future.
> >> > > >>  As
> >> > > >> >>> you might have guessed, this is just about as
far away from
> the
> >> > > >> original
> >> > > >> >>> intent of Kafka as you could get - we have PHP
that sends
> >> messages
> >> > > to
> >> > > >> >>> Kafka.  Since it's not good to hold a TCP connection
open in
> >> PHP,
> >> > we
> >> > > >> had
> >> > > >> >> to
> >> > > >> >>> do some trickery here.  There was no existing
Node client so
> we
> >> > had
> >> > > to
> >> > > >> >>> write our own.  And since there are 100 million
users
> registered
> >> > on
> >> > > >> >> Tagged,
> >> > > >> >>> that means we could have in theory 100M topics.
 Of course in
> >> > > >> practice we
> >> > > >> >>> have far fewer than that.  One of the main things
we
> currently
> >> > have
> >> > > >> to do
> >> > > >> >>> is aggressively clean topics.  But basically
we have many
> >> topics,
> >> > > few
> >> > > >> >>> messages (relatively) per topic.  And order
matters, so we
> had
> >> to
> >> > > deal
> >> > > >> >> with
> >> > > >> >>> ensuring that we could handle the number of
topics we would
> >> > create,
> >> > > >> and
> >> > > >> >>> ensure ordered delivery and receipt.
> >> > > >> >>>
> >> > > >> >>> In the future I have big plans for Kafka, another
feature is
> >> > > >> currently in
> >> > > >> >>> private test and will be released to the public
soon (it uses
> >> > Kafka
> >> > > >> in a
> >> > > >> >>> more traditional way).  And we hope to have
many more in
> 2012...
> >> > > >> >>>
> >> > > >> >>
> >> > > >>
> >> > > >
> >> > > >
> >> > >
> >> >
> >>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message