Return-Path: X-Original-To: apmail-lucene-solr-user-archive@minotaur.apache.org Delivered-To: apmail-lucene-solr-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 71164107B3 for ; Tue, 29 Oct 2013 15:48:38 +0000 (UTC) Received: (qmail 59156 invoked by uid 500); 29 Oct 2013 15:45:51 -0000 Delivered-To: apmail-lucene-solr-user-archive@lucene.apache.org Received: (qmail 58325 invoked by uid 500); 29 Oct 2013 15:44:32 -0000 Mailing-List: contact solr-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: solr-user@lucene.apache.org Delivered-To: mailing list solr-user@lucene.apache.org Received: (qmail 57511 invoked by uid 99); 29 Oct 2013 15:43:06 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 29 Oct 2013 15:43:06 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of gun.akkor@carbonblack.com designates 209.85.128.173 as permitted sender) Received: from [209.85.128.173] (HELO mail-ve0-f173.google.com) (209.85.128.173) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 29 Oct 2013 15:43:02 +0000 Received: by mail-ve0-f173.google.com with SMTP id jw12so18440veb.32 for ; Tue, 29 Oct 2013 08:42:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=carbonblack.com; s=google; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=xFtzjFfDIQBA5BVra7QRLO0wItAfJZrUy2OYC/Yh7tg=; b=NL9MXnw1ePKnRt0LLXA2SckS7KLS0yxUzqHLmlNrKzA853AJ0Svxbl92eDqZYAksD5 ye4phNMmHY8gIzpD77SgmQCD+vlNnNxXUTufB+77WN3PfQGTDpVTUMetqZWVVtUtF3/U K6bkPUNxmm5j6Pg2s5d701CVXQhukPGhX4CvE= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:content-type; bh=xFtzjFfDIQBA5BVra7QRLO0wItAfJZrUy2OYC/Yh7tg=; b=F0Mn//G2JHQNQttWUyYWsIW0oN5rE1FV2nGSiPDsJHvnfVxkRE86XDHIcT/1smWAb1 ctG6HPxaXizQH94B6ogPireq7mHEfKtyXUq0mYHeGMv3i85TYaBfFvn50DXhORa2wgsV hoCrodJFhOrmpQo/pZ8biAoM8PU9YLwryj8H6Q/ptpfvuUOQuA+6fZ9QT0VZQJo/H063 FmA1V/7qEMick2YC9CudZM7ABrlODaUnQpwYEZ2s+/GOVtAWQiAwLbhUPm1de/xOVNn1 CGaNfQEyLYN8gAgp/dEgp/ciIu89exnwtiBGNbl7R4RgILbboQR/ccn47YBQYxE9Ch0+ iMQg== X-Gm-Message-State: ALoCoQmIW/NwI8eML+9QlWds8wslKGOQqV9thbLjO79qqHBJh8kxvvOl7zQeJtzaSrILu0E7dtls MIME-Version: 1.0 X-Received: by 10.52.89.243 with SMTP id br19mr56637vdb.102.1383061361394; Tue, 29 Oct 2013 08:42:41 -0700 (PDT) Received: by 10.58.169.84 with HTTP; Tue, 29 Oct 2013 08:42:41 -0700 (PDT) In-Reply-To: References: Date: Tue, 29 Oct 2013 11:42:41 -0400 Message-ID: Subject: Re: Reclaiming disk space from (large, optimized) segments From: Gun Akkor To: "solr-user@lucene.apache.org" Content-Type: multipart/alternative; boundary=20cf307f389e26218004e9e3112f X-Virus-Checked: Checked by ClamAV on apache.org --20cf307f389e26218004e9e3112f Content-Type: text/plain; charset=ISO-8859-1 Otis, Thank you for your response, Could you elaborate a bit more on what you have in mind when you say "time-based" indices? Gun --- Senior Software Engineer Carbon Black, Inc. gun.akkor@carbonblack.com On Thu, Oct 24, 2013 at 11:56 PM, Otis Gospodnetic < otis.gospodnetic@gmail.com> wrote: > Only skimmed your email, but purge every 4 hours jumped out at me. Would it > make sense to have time-based indices that can be periodically dropped > instead of being purged? > > Otis > Solr & ElasticSearch Support > http://sematext.com/ > On Oct 23, 2013 10:33 AM, "Scott Lundgren" > > wrote: > > > *Background:* > > > > - Our use case is to use SOLR as a massive FIFO queue. > > > > - Document additions and updates happen continuously. > > > > - Documents are being added at sustained a rate of 50 - 100 documents > > per second. > > > > - About 50% of these document are updates to existing docs, indexed > > using atomic updates: the original doc is thus deleted and re-added. > > > > - There is a separate purge operation running every four hours that > deletes > > the oldest docs, if required based on a number of unrelated configuration > > parameters. > > > > - At some time in the past, a manual force merge / optimize with > > maxSegments=2 was run to troubleshoot high disk i/o and remove "too many > > segments" as a potential variable. Currently, the largest fdts are 74G > and > > 43G. There are 47 total segments, the largest other sizes are all > around > > 2G. > > > > - Merge policies are all at Solr 4 defaults. Index size is currently ~50M > > maxDocs, ~35M numDocs, 276GB. > > > > *Issue:* > > > > The background purge operation is deleting docs on schedule, but the disk > > space is not being recovered. > > > > *Presumptions:* > > I presume, but have not confirmed (how?) the 15M deleted documents are > > predominately in the two large segments. Because they are largely in the > > two large segments, and those large segments still have (some/many) live > > documents, the segment backing files are not deleted. > > > > *Questions:* > > > > - When will those segments get merged and documents recovered? Does it > > happen when _all_ the documents in those segments are deleted? Some > > percentage of the segment is filled with deleted documents? > > - Is there a way to do it right now vs. just waiting? > > - In some cases, the purge delete conditional is _just_ free disk space: > > when index > free space, delete oldest. Those setups are now in > scenarios > > where index >> free space, and getting worse. How does low disk space > > effect above two questions? > > - Is there a way for me to determine stats on a per-segment basis? > > - for example, how many deleted documents in a particular segment? > > - On the flip side, can I determine in what segment a particular document > > is located? > > > > Thank you, > > > > Scott > > > > -- > > Scott Lundgren > > Director of Engineering > > Carbon Black, Inc. > > (210) 204-0483 | scott.lundgren@carbonblack.com > > > --20cf307f389e26218004e9e3112f--