Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 1F922200C39 for ; Thu, 2 Mar 2017 02:53:56 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 1E251160B78; Thu, 2 Mar 2017 01:53:56 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 67D14160B70 for ; Thu, 2 Mar 2017 02:53:55 +0100 (CET) Received: (qmail 82635 invoked by uid 500); 2 Mar 2017 01:53:54 -0000 Mailing-List: contact dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@lucene.apache.org Delivered-To: mailing list dev@lucene.apache.org Received: (qmail 82625 invoked by uid 99); 2 Mar 2017 01:53:54 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 02 Mar 2017 01:53:54 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id C87D4C023B for ; Thu, 2 Mar 2017 01:53:53 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -2.347 X-Spam-Level: X-Spam-Status: No, score=-2.347 tagged_above=-999 required=6.31 tests=[RP_MATCHES_RCVD=-2.999, SPF_NEUTRAL=0.652] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id wCFn9Z7EFIwO for ; Thu, 2 Mar 2017 01:53:53 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTP id BB05A5F254 for ; Thu, 2 Mar 2017 01:53:52 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 376EFE0A17 for ; Thu, 2 Mar 2017 01:53:46 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 8B97C2416E for ; Thu, 2 Mar 2017 01:53:45 +0000 (UTC) Date: Thu, 2 Mar 2017 01:53:45 +0000 (UTC) From: "Yonik Seeley (JIRA)" To: dev@lucene.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (SOLR-10205) Evaluate and reduce BlockCache store failures MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Thu, 02 Mar 2017 01:53:56 -0000 [ https://issues.apache.org/jira/browse/SOLR-10205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yonik Seeley updated SOLR-10205: -------------------------------- Attachment: SOLR-10205.patch Here's a snapshot of the modifications and tests I'm using to reduce store failures and evaluate the potential impact. It's messy... yet another hacked up version of the HDFS performance test I used to uncover the blockcache corruption issues. I don't plan on committing most of this. It's just for evaluating what should be committed to reduce store failures. > Evaluate and reduce BlockCache store failures > --------------------------------------------- > > Key: SOLR-10205 > URL: https://issues.apache.org/jira/browse/SOLR-10205 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Reporter: Yonik Seeley > Assignee: Yonik Seeley > Attachments: SOLR-10205.patch > > > The BlockCache is written such that requests to cache a block (BlockCache.store call) can fail, making caching less effective. We should evaluate the impact of this storage failure and potentially reduce the number of storage failures. > The implementation reserves a single block of memory. In store, a block of memory is allocated, and then a pointer is inserted into the underling map. A block is only freed when the underlying map evicts the map entry. > This means that when two store() operations are called concurrently (even under low load), one can fail. This is made worse by the fact that concurrent maps typically tend to amortize the cost of eviction over many keys (i.e. the actual size of the map can grow beyond the configured maximum number of entries... both the older ConcurrentLinkedHashMap and newer Caffeine do this). When this is the case, store() won't be able to find a free block of memory, even if there aren't any other concurrently operating stores. -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For additional commands, e-mail: dev-help@lucene.apache.org