Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id EC2C1200B52 for ; Mon, 25 Jul 2016 10:01:27 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id EAC1A160A7D; Mon, 25 Jul 2016 08:01:27 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 40247160A78 for ; Mon, 25 Jul 2016 10:01:27 +0200 (CEST) Received: (qmail 50700 invoked by uid 500); 25 Jul 2016 08:01:21 -0000 Mailing-List: contact dev-help@kafka.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@kafka.apache.org Delivered-To: mailing list dev@kafka.apache.org Received: (qmail 50398 invoked by uid 99); 25 Jul 2016 08:01:21 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 25 Jul 2016 08:01:21 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id B49C62C0D61 for ; Mon, 25 Jul 2016 08:01:20 +0000 (UTC) Date: Mon, 25 Jul 2016 08:01:20 +0000 (UTC) From: "Ismael Juma (JIRA)" To: dev@kafka.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Comment Edited] (KAFKA-3973) Investigate feasibility of caching bytes vs. records MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Mon, 25 Jul 2016 08:01:28 -0000 [ https://issues.apache.org/jira/browse/KAFKA-3973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15391502#comment-15391502 ] Ismael Juma edited comment on KAFKA-3973 at 7/25/16 8:01 AM: ------------------------------------------------------------- Also, MemoryMeasurer has the following enum: {code} public static enum Guess { /* If instrumentation is not available, error when measuring */ NEVER, /* If instrumentation is available, use it, otherwise guess the size using predefined specifications */ FALLBACK_SPEC, /* If instrumentation is available, use it, otherwise guess the size using sun.misc.Unsafe */ FALLBACK_UNSAFE, /* If instrumentation is available, use it, otherwise guess the size using sun.misc.Unsafe; if that is unavailable, * guess using predefined specifications.*/ FALLBACK_BEST, /* Always guess the size of measured objects using predefined specifications*/ ALWAYS_SPEC, /* Always guess the size of measured objects using sun.misc.Unsafe */ ALWAYS_UNSAFE } {code} Which option did you test, the instrumentation one (that requires a Java agent to be configured)? was (Author: ijuma): Also, MemoryMeasurer has the following enum: {code} public static enum Guess { /* If instrumentation is not available, error when measuring */ NEVER, /* If instrumentation is available, use it, otherwise guess the size using predefined specifications */ FALLBACK_SPEC, /* If instrumentation is available, use it, otherwise guess the size using sun.misc.Unsafe */ FALLBACK_UNSAFE, /* If instrumentation is available, use it, otherwise guess the size using sun.misc.Unsafe; if that is unavailable, * guess using predefined specifications.*/ FALLBACK_BEST, /* Always guess the size of measured objects using predefined specifications*/ ALWAYS_SPEC, /* Always guess the size of measured objects using sun.misc.Unsafe */ ALWAYS_UNSAFE } {code} Which option did you test, the instrumentation one (that required a Java agent to be configured)? > Investigate feasibility of caching bytes vs. records > ---------------------------------------------------- > > Key: KAFKA-3973 > URL: https://issues.apache.org/jira/browse/KAFKA-3973 > Project: Kafka > Issue Type: Sub-task > Components: streams > Reporter: Eno Thereska > Assignee: Bill Bejeck > Fix For: 0.10.1.0 > > Attachments: CachingPerformanceBenchmarks.java, MemoryLRUCache.java > > > Currently the cache stores and accounts for records, not bytes or objects. This investigation would be around measuring any performance overheads that come from storing bytes or objects. As an outcome we should know whether 1) we should store bytes or 2) we should store objects. > If we store objects, the cache still needs to know their size (so that it can know if the object fits in the allocated cache space, e.g., if the cache is 100MB and the object is 10MB, we'd have space for 10 such objects). The investigation needs to figure out how to find out the size of the object efficiently in Java. > If we store bytes, then we are serialising an object into bytes before caching it, i.e., we take a serialisation cost. The investigation needs measure how bad this cost can be especially for the case when all objects fit in cache (and thus any extra serialisation cost would show). -- This message was sent by Atlassian JIRA (v6.3.4#6332)