Return-Path: X-Original-To: apmail-incubator-kafka-dev-archive@minotaur.apache.org Delivered-To: apmail-incubator-kafka-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 48D34E1D9 for ; Tue, 27 Nov 2012 21:35:50 +0000 (UTC) Received: (qmail 24369 invoked by uid 500); 27 Nov 2012 21:35:50 -0000 Delivered-To: apmail-incubator-kafka-dev-archive@incubator.apache.org Received: (qmail 24265 invoked by uid 500); 27 Nov 2012 21:35:50 -0000 Mailing-List: contact kafka-dev-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: kafka-dev@incubator.apache.org Delivered-To: mailing list kafka-dev@incubator.apache.org Received: (qmail 24256 invoked by uid 99); 27 Nov 2012 21:35:50 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 27 Nov 2012 21:35:50 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of jay.kreps@gmail.com designates 209.85.160.47 as permitted sender) Received: from [209.85.160.47] (HELO mail-pb0-f47.google.com) (209.85.160.47) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 27 Nov 2012 21:35:41 +0000 Received: by mail-pb0-f47.google.com with SMTP id un1so6799772pbc.6 for ; Tue, 27 Nov 2012 13:35:20 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=IQybKlxhSjSWeSg+xOlylZTwHICyacSyfb4bc2MtKjo=; b=ZJ36G13t9fTBZdFRyQ0KoIaQ46e9zCaCNC0B89f+JgmEoZf7GafEzCkvcugR1+HNRx 1fOB31UBWfGUf7kRwR9RjuSkOJ24pXljrwqoOziPL9rx40FP7DOYp9x/X+cC7LKEAmYw ZldtIhjEVU2N2ZptvkueG+C3mGfnPg6oyWkSPYtMF8uXOyvUfSt9ksVcADDwA8Z7hg99 cMse8FjYOi+BPQDhRWG/j/IPp0j7c6udfxGHdVpW3LJ6HeclWwuNlze80jHjqkQCS18V gvGkE7PhFHUrQTx379pdHGH0vXUIgExvo3w8xC1Z17AHk2Lq1k1xBias5cAQmdhH4XLS yVxg== MIME-Version: 1.0 Received: by 10.66.87.226 with SMTP id bb2mr46161282pab.57.1354052120423; Tue, 27 Nov 2012 13:35:20 -0800 (PST) Received: by 10.66.122.67 with HTTP; Tue, 27 Nov 2012 13:35:20 -0800 (PST) In-Reply-To: References: Date: Tue, 27 Nov 2012 13:35:20 -0800 Message-ID: Subject: Re: understanding partitions based on wiki example of profile visits From: Jay Kreps To: "kafka-dev@incubator.apache.org" Content-Type: multipart/alternative; boundary=f46d042fd91ca52ede04cf80d373 X-Virus-Checked: Checked by ClamAV on apache.org --f46d042fd91ca52ede04cf80d373 Content-Type: text/plain; charset=ISO-8859-1 We don't have a partition per user, there is no need for that. In the same way a distributed database doesn't have a partition per user. A partition is just a physical grouping of keys. -Jay On Tue, Nov 27, 2012 at 12:00 PM, S Ahmed wrote: > How does that work out though, I mean with 10 million users that is 10 > million files at least. > > > On Mon, Nov 26, 2012 at 2:02 PM, Jay Kreps wrote: > > > Yeah a partition is physically implemented as a log (i.e. a sequence of > > files containing a bunch of messages indexed by offset). So each server > can > > have lots of partitions, but each partition exists entirely on a server. > > > > So in the "newsfeed" case if you partition by user id, you would be > > guaranteed that all activity relevant to that user went to a single > > processor. In our case, yes, we serve out of a different system which is > > the destination after all the pre-processing. > > > > > > On Mon, Nov 26, 2012 at 9:19 AM, S Ahmed wrote: > > > > > >Yes, your description is correct. A particular member's data would all > > be > > > >in one partition. > > > When you say in one partition, that also means on the same server? Or > a > > > partition can span a brocker node? > > > > > > At the file level, I'm guessing it has its own physical file then? (or > > set > > > of files as it grows with the file number suffix). > > > > > > So at linkedIn, is this how you present a users dashboard inbox (your > > > friend has a new job, they updated their profile, someone recommended > > them, > > > etc.) I guess you can further sort at the application level then, and > > > cache to a different store? > > > > > > > > > On Mon, Nov 26, 2012 at 11:53 AM, Jay Kreps > wrote: > > > > > > > Yes, your description is correct. A particular member's data would > all > > be > > > > in one partition. > > > > > > > > Broker partitions are just the unit of parallelism--think of each > > > partition > > > > as a totally ordered log you can append to and read from. The > > consumption > > > > of one of these partition logs is single threaded. > > > > > > > > The guarantee is that all messages are added to a partition in the > > order > > > > they arrive. From the point of view of a single producer client this > > will > > > > also be the order in which they are sent. These messages are then > > > delivered > > > > in this order to a consumer thread. > > > > > > > > Hope that helps. > > > > > > > > -Jay > > > > > > > > > > > > > > > > > > > > On Sun, Nov 25, 2012 at 7:54 PM, S Ahmed > wrote: > > > > > > > > > The wiki states "Consider an application that would like to > maintain > > an > > > > > aggregation of the number of profile visitors for each member. It > > would > > > > > like to send all profile visit events for a member to a particular > > > > > partition and, hence, have all updates for a member to appear in > the > > > same > > > > > stream for the same consumer thread." ( > > > > > http://incubator.apache.org/kafka/design.html) > > > > > > > > > > So say I have 5 broker servers, now my producer will send a message > > > for a > > > > > particular profile page visit, with the default algorithm using > > > > > hash(member_id)%num_partitions > > > > > to figur out which broker server to send it it. > > > > > > > > > > So a particular members pageview messages will all go to a single > > > server > > > > > then, is this the case? And therefore all the messages for a given > > > user > > > > > will be in the correct order also right? > > > > > > > > > > So a consumer group that subscribes to the 'profile-page-view' > topic > > > will > > > > > consume page view related messages, is it possible to subscribe to > a > > > > > particular broker partition also? > > > > > > > > > > Are broker partitions meant for cases when you want all messages to > > be > > > > > saved on the same node? > > > > > > > > > > > > > > > --f46d042fd91ca52ede04cf80d373--