Return-Path: X-Original-To: apmail-incubator-kafka-dev-archive@minotaur.apache.org Delivered-To: apmail-incubator-kafka-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 896C17EE5 for ; Wed, 27 Jul 2011 01:19:40 +0000 (UTC) Received: (qmail 23198 invoked by uid 500); 27 Jul 2011 01:19:40 -0000 Delivered-To: apmail-incubator-kafka-dev-archive@incubator.apache.org Received: (qmail 23166 invoked by uid 500); 27 Jul 2011 01:19:40 -0000 Mailing-List: contact kafka-dev-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: kafka-dev@incubator.apache.org Delivered-To: mailing list kafka-dev@incubator.apache.org Received: (qmail 23157 invoked by uid 99); 27 Jul 2011 01:19:39 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 27 Jul 2011 01:19:39 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of chris.burroughs@gmail.com designates 209.85.216.47 as permitted sender) Received: from [209.85.216.47] (HELO mail-qw0-f47.google.com) (209.85.216.47) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 27 Jul 2011 01:19:32 +0000 Received: by qwh5 with SMTP id 5so615116qwh.6 for ; Tue, 26 Jul 2011 18:19:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=message-id:date:from:user-agent:mime-version:to:subject :content-type:content-transfer-encoding; bh=pOkO9irYdN3SkX8rVs8X3z4D/Xmf0wjOBBmVfbf5DlI=; b=LXvhzdBwwumltTXTi8W52/pjfzOk3WQCs/HyZHJfxwzQ/Ihd0GSspfttMpc5KHaSnv U4EgmIg/kRrt7Gfp4EhwdfJFy6kwWoNyAp8IuVrE6RAa+8MqorvRR8h2Glkx2SazYG5W IGZDaJ3QbnyZIUR1lpQbNFx3kz5qZyIF/tcPI= Received: by 10.224.211.70 with SMTP id gn6mr4643105qab.382.1311729551704; Tue, 26 Jul 2011 18:19:11 -0700 (PDT) Received: from [10.10.55.42] (cl-pat-tr.clearspring.com [8.18.54.254]) by mx.google.com with ESMTPS id bg20sm520952qab.8.2011.07.26.18.19.10 (version=SSLv3 cipher=OTHER); Tue, 26 Jul 2011 18:19:11 -0700 (PDT) Message-ID: <4E2F678E.6060500@gmail.com> Date: Tue, 26 Jul 2011 21:19:10 -0400 From: Chris Burroughs User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.18) Gecko/20110617 Thunderbird/3.1.11 MIME-Version: 1.0 To: kafka-dev@incubator.apache.org Subject: On time/offset indexs Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org So for good reason [1] Kafka doesn't keep a complicated time --> offset index. Whatever is the start and end of log file is what you get. We can approximate finer grained time indexes with smaller log files [2] and getOffsetsBefore, but we would really prefer not to have lots of small files everywhere. To solve the case of wanting time based indexes without lots of files could we have another append only companion file for each Log that periodically (I'm thinking on the order of 1 minute) gets timestamp:offset appended to it? That should have low overhead and if the companion file is missing/deleted/etc we can still use the current logic. [1] "Furthermore the complexity of maintaining the mapping from a random id to an offset requires a heavy weight index structure which must be synchronized with disk, essentially requiring a full persistent random-access data structure. " http://sna-projects.com/kafka/design.php [2] And KAFKA-40 would make this easier to do.