Return-Path: X-Original-To: apmail-kafka-users-archive@www.apache.org Delivered-To: apmail-kafka-users-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 6912FC8B3 for ; Tue, 29 Jan 2013 03:09:57 +0000 (UTC) Received: (qmail 25149 invoked by uid 500); 29 Jan 2013 03:09:57 -0000 Delivered-To: apmail-kafka-users-archive@kafka.apache.org Received: (qmail 25045 invoked by uid 500); 29 Jan 2013 03:09:56 -0000 Mailing-List: contact users-help@kafka.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: users@kafka.apache.org Delivered-To: mailing list users@kafka.apache.org Received: (qmail 25018 invoked by uid 99); 29 Jan 2013 03:09:56 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 29 Jan 2013 03:09:56 +0000 X-ASF-Spam-Status: No, hits=-0.5 required=5.0 tests=FREEMAIL_ENVFROM_END_DIGIT,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of tombrown52@gmail.com designates 74.125.82.45 as permitted sender) Received: from [74.125.82.45] (HELO mail-wg0-f45.google.com) (74.125.82.45) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 29 Jan 2013 03:09:50 +0000 Received: by mail-wg0-f45.google.com with SMTP id dq12so2211888wgb.12 for ; Mon, 28 Jan 2013 19:09:29 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:x-received:in-reply-to:references:date:message-id :subject:from:to:content-type; bh=8IldlFlEGU7SneiOXP6q2k5rsd1TaQgigK6KsEgvqy0=; b=DD8f2OeJF1jv+mmIwEfnSvY25PyTCBwD/oRP1g0C+5scnD4f9E4c8WGJhfMe8P84BK RNogSqAkXJuwKYDA4KVOxYUZIrw0FXQfUawSQ/PnXi4gtO+QD9HzYGaRyfqDjcotV9n4 5zzNDHiwavtp84wYKj8Ea7XEa0dWwE+IYf/J6djxe7TozaU5C9NBf2zZIc+ZBwgG14Na aRs89EAE+ubBpvj7oTkfQ+XZAHEdUwABIbVWAHJI3K8BRf3WR8u5Ic5pusGY+6UJ3yYH GWrl0tVCjdDgFiTY2uok6ECxelhI0/utHh1bTxXnqJ7OD4YagTjF+hwCxWLouVmaaYTY iFCQ== MIME-Version: 1.0 X-Received: by 10.194.87.200 with SMTP id ba8mr29295wjb.22.1359428969646; Mon, 28 Jan 2013 19:09:29 -0800 (PST) Received: by 10.194.54.229 with HTTP; Mon, 28 Jan 2013 19:09:29 -0800 (PST) In-Reply-To: References: Date: Mon, 28 Jan 2013 20:09:29 -0700 Message-ID: Subject: Re: How do you keep track of offset in a partition From: Tom Brown To: users@kafka.apache.org Content-Type: text/plain; charset=ISO-8859-1 X-Virus-Checked: Checked by ClamAV on apache.org Since offsets in Kafka 0.7x are just byte counts, you cannot know the number of messages remaining to be processed (subtract your consumers offsets from each partitions end offset). However, you can know the number of bytes remaining. Knowing the average message size, you can use that to make a rough guess as to how many messages are remaining. --Tom On Mon, Jan 28, 2013 at 8:03 PM, S Ahmed wrote: > Once you have an offset, is it possible to know how many messages there are > from that point to the end? (or least for the particular topic partition > that you are requested data from?). > > The idea is to get an idea how far behind the consumers are from the # of > messages coming in etc. > > I'm guessing the broker's dont' really know how many messages they are > currently storing? Or is that what the index is for? > > > > > On Mon, Jan 28, 2013 at 8:27 PM, Neha Narkhede wrote: > >> Jamie, >> >> You need to use the getOffsetsBefore() API to get the earliest/latest >> offset available on the broker for a particular partition. >> >> Thanks, >> Neha >> >> >> On Mon, Jan 28, 2013 at 5:05 PM, Jamie Wang >> wrote: >> >> > Hi, >> > >> > We are using 0.72 version of Kafka on Windows. I am wondering what is the >> > right way to fetch data and keep track of offset in a partition. For >> > example, I am currently assuming the first message the producer sent to >> the >> > broker is at offset 0. So far it seems working. Is this correct >> assumption? >> > >> > Let' say 2 days later, the first 100 messages on the broker is discarded >> > because it passed retention.hours set in the config file. Now what is the >> > offset I should use to retrieve the first message in the partition? And >> > let's also say the offset I had for the 80th message is now not valid. >> > What is the right way to get the correct offset to fetch in the consumer? >> > >> > What is the purpose of the api for getting a list of valid offsets for >> all >> > segments in a partition? >> > >> > Thanks in advance for your help. >> > >> > Jamie >> > >>