lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gun Akkor <gun.ak...@carbonblack.com>
Subject Re: Reclaiming disk space from (large, optimized) segments
Date Tue, 29 Oct 2013 15:42:41 GMT
Otis,

Thank you for your response,

Could you elaborate a bit more on what you have in mind when you say
"time-based" indices?

Gun


---
Senior Software Engineer
Carbon Black, Inc.
gun.akkor@carbonblack.com


On Thu, Oct 24, 2013 at 11:56 PM, Otis Gospodnetic <
otis.gospodnetic@gmail.com> wrote:

> Only skimmed your email, but purge every 4 hours jumped out at me. Would it
> make sense to have time-based indices that can be periodically dropped
> instead of being purged?
>
> Otis
> Solr & ElasticSearch Support
> http://sematext.com/
> On Oct 23, 2013 10:33 AM, "Scott Lundgren" <scott.lundgren@carbonblack.com
> >
> wrote:
>
> > *Background:*
> >
> > - Our use case is to use SOLR as a massive FIFO queue.
> >
> > - Document additions and updates happen continuously.
> >
> >     - Documents are being added at sustained a rate of 50 - 100 documents
> > per second.
> >
> >     - About 50% of these document are updates to existing docs, indexed
> > using atomic updates: the original doc is thus deleted and re-added.
> >
> > - There is a separate purge operation running every four hours that
> deletes
> > the oldest docs, if required based on a number of unrelated configuration
> > parameters.
> >
> > - At some time in the past, a manual force merge / optimize with
> > maxSegments=2 was run to troubleshoot high disk i/o and remove "too many
> > segments" as a potential variable.  Currently, the largest fdts are 74G
> and
> > 43G.   There are 47 total segments, the largest other sizes are all
> around
> > 2G.
> >
> > - Merge policies are all at Solr 4 defaults. Index size is currently ~50M
> > maxDocs, ~35M numDocs, 276GB.
> >
> > *Issue:*
> >
> > The background purge operation is deleting docs on schedule, but the disk
> > space is not being recovered.
> >
> > *Presumptions:*
> > I presume, but have not confirmed (how?) the 15M deleted documents are
> > predominately in the two large segments.  Because they are largely in the
> > two large segments, and those large segments still have (some/many) live
> > documents, the segment backing files are not deleted.
> >
> > *Questions:*
> >
> > - When will those segments get merged and documents recovered?  Does it
> > happen when _all_ the documents in those segments are deleted?  Some
> > percentage of the segment is filled with deleted documents?
> > - Is there a way to do it right now vs. just waiting?
> > - In some cases, the purge delete conditional is _just_ free disk space:
> >  when index > free space, delete oldest.  Those setups are now in
> scenarios
> > where index >> free space, and getting worse.  How does low disk space
> > effect above two questions?
> > - Is there a way for me to determine stats on a per-segment basis?
> >    - for example, how many deleted documents in a particular segment?
> > - On the flip side, can I determine in what segment a particular document
> > is located?
> >
> > Thank you,
> >
> > Scott
> >
> > --
> > Scott Lundgren
> > Director of Engineering
> > Carbon Black, Inc.
> > (210) 204-0483 | scott.lundgren@carbonblack.com
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message