kafka-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Burroughs <chris.burrou...@gmail.com>
Subject On time/offset indexs
Date Wed, 27 Jul 2011 01:19:10 GMT
So for good reason [1] Kafka doesn't keep a complicated time --> offset
index.  Whatever is the start and end of log file is what you get.  We
can approximate finer grained time indexes with smaller log files [2]
and getOffsetsBefore, but we would really prefer not to have lots of
small files everywhere.

To solve the case of wanting time based indexes without lots of files
could we have another append only companion file for each Log that
periodically (I'm thinking on the order of 1 minute) gets
timestamp:offset appended to it?  That should have low overhead and if
the companion file is missing/deleted/etc we can still use the current
logic.

[1] "Furthermore the complexity of maintaining the mapping from a random
id to an offset requires a heavy weight index structure which must be
synchronized with disk, essentially requiring a full persistent
random-access data structure. " http://sna-projects.com/kafka/design.php

[2] And KAFKA-40 would make this easier to do.

Mime
View raw message