Return-Path: X-Original-To: apmail-kafka-users-archive@www.apache.org Delivered-To: apmail-kafka-users-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id E0B7310234 for ; Mon, 7 Oct 2013 02:07:46 +0000 (UTC) Received: (qmail 60490 invoked by uid 500); 7 Oct 2013 02:07:46 -0000 Delivered-To: apmail-kafka-users-archive@kafka.apache.org Received: (qmail 60464 invoked by uid 500); 7 Oct 2013 02:07:46 -0000 Mailing-List: contact users-help@kafka.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: users@kafka.apache.org Delivered-To: mailing list users@kafka.apache.org Received: (qmail 60456 invoked by uid 99); 7 Oct 2013 02:07:46 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 07 Oct 2013 02:07:46 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: local policy includes SPF record at spf.trusted-forwarder.org) Received: from [209.85.212.181] (HELO mail-wi0-f181.google.com) (209.85.212.181) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 07 Oct 2013 02:07:40 +0000 Received: by mail-wi0-f181.google.com with SMTP id ex4so4092414wid.14 for ; Sun, 06 Oct 2013 19:07:19 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:content-type; bh=0g5bmz3Xj2hQgoLdGDnHvPyWYLo87bcQl7Mf2g0m4KY=; b=Tov+GGw9u/hrSNubKO4HqzNbajXElkS9GEvCQn4N9XZq9FP9hAQ24abSS9GD9mGdYF g9I3/wP7UP+xATbN7ABqOvK24Ik0KzGJYRTqmi1ljQUC4lluHFQaZq04vO3pgvtLTAEl W+OQfxxUdkHgBqoxQnpfuWULs0mfw3thETi3HyYpcg2qzFXSv8ConuZnylmjq2m5IE/J UCnaHbLKGwzJCzW9XuguKqUu4u0JkEOi2nU1AjwuIMmj/A2RrtOvWVF67M47hhaEWryI 4wtUZ+yH9/FazwX4jD2MTq1xYk8HnHGjs7xBrWwAnCkyuIQbgxjlb3VrPVi6nE/U1+8+ Ijuw== X-Gm-Message-State: ALoCoQnMoHqDOxmchZfeDr5jzsJ5Jovmeq9nVb4MFKVHaKHOZwetr04aPFtNZPxdkl88QISV9Z4s X-Received: by 10.180.109.132 with SMTP id hs4mr16693414wib.46.1381111639824; Sun, 06 Oct 2013 19:07:19 -0700 (PDT) MIME-Version: 1.0 Received: by 10.217.6.198 with HTTP; Sun, 6 Oct 2013 19:06:59 -0700 (PDT) In-Reply-To: References: From: Benjamin Black Date: Sun, 6 Oct 2013 19:06:59 -0700 Message-ID: Subject: Re: Managing Millions of Paritions in Kafka To: users@kafka.apache.org Content-Type: multipart/alternative; boundary=e89a8f2355c7aff28004e81d1ce9 X-Virus-Checked: Checked by ClamAV on apache.org --e89a8f2355c7aff28004e81d1ce9 Content-Type: text/plain; charset=ISO-8859-1 Ha ha, yes, exactly, you need a database. Kafka is a wonderful tool, but not the right one for a job like that. On Sun, Oct 6, 2013 at 7:03 PM, Ravindranath Akila < ravindranathakila@gmail.com> wrote: > Actually, we need a broker. But a more stateful one. Hence the decision to > use TTL on hbase. > On 7 Oct 2013 08:38, "Benjamin Black" wrote: > > > What you are discovering is that Kafka is a message broker, not a > database. > > > > > > On Sun, Oct 6, 2013 at 5:34 PM, Ravindranath Akila < > > ravindranathakila@gmail.com> wrote: > > > > > Thanks a lot Neha! > > > > > > Actually, using keyed messages(with Simple Consumers) was the approach > we > > > took. But it seems we can't map each user to a new partition due to > > > Zookeeper limitations. Rather, we will have to map a "group" of users > on > > > one partition. Then we can't fetch the messages for only one user. > > > > > > It seems our data is best put on HBase with a TTL and versioning. > > > > > > Thanks! > > > > > > R. A. > > > On 6 Oct 2013 16:00, "Neha Narkhede" wrote: > > > > > > > Kafka is designed to have of the order of few thousands of partitions > > > > roughly less than 10,000. And the main bottleneck is zookeeper. A > > better > > > > way to design such a system is to have fewer partitions and use keyed > > > > messages to distribute the data over a fixed set of partitions. > > > > > > > > Thanks, > > > > Neha > > > > On Oct 5, 2013 8:19 PM, "Ravindranath Akila" < > > > ravindranathakila@gmail.com> > > > > wrote: > > > > > > > > > Initially, I thought dynamic topic creation can be used to maintain > > per > > > > > user data on Kafka. The I read that partitions can and should be > used > > > for > > > > > this instead. > > > > > > > > > > If a partition is to be used to map a user, can there be a million, > > or > > > > even > > > > > billion partitions in a cluster? How does one go about designing > > such a > > > > > model. > > > > > > > > > > Can the replication tool be used to assign, say partitions 1 - > 10,000 > > > on > > > > > replica 1, and 10,001 - 20,000 on replica 2? > > > > > > > > > > If not, since there is a ulimit on the file system, should one > model > > it > > > > > based on a replica/topic/partition approach. Say users 1-10,000 go > on > > > > topic > > > > > 10k-1, and has 10,000 partitions, and users 10,0001-20,000 go on > > topic > > > > > 10k-2, and has 10,000 partitions. > > > > > > > > > > Simply put, how can a million stateful data points be handled? (I > > > deduced > > > > > that a userid-partition number mapping can be done via a > partitioner, > > > but > > > > > unless a server can be configured to handle only a given set of > > > > partitions, > > > > > with a range based notation, it is almost impossible to handle a > > large > > > > > dataset. Is it that Kafka can only handle a limited set of stateful > > > data > > > > > right now?) > > > > > > > > > > > > > > > > > > > > > > > > > http://stackoverflow.com/questions/17205561/data-modeling-with-kafka-topics-and-partitions > > > > > > > > > > Btw, why does Kafka have to keep open each partition? Can't a > > partition > > > > be > > > > > opened for read/write when needed only? > > > > > > > > > > Thanks in advance! > > > > > > > > > > > > > > > --e89a8f2355c7aff28004e81d1ce9--