cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yiming Sun <yiming....@gmail.com>
Subject Re: 8 million Cassandra data files on disk
Date Tue, 02 Aug 2011 21:42:19 GMT
Hi Jonathan,

Good to know.  We will certainly upgrade to 0.7.8.

Also, here is the link to that post I came across earlier:

http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Files-not-deleted-after-compaction-and-GCed-td5960453.html

best,

-- Y.

On Tue, Aug 2, 2011 at 5:36 PM, Jonathan Ellis <jbellis@gmail.com> wrote:

> I don't remember a removing-compacted-files bug in 0.7.0, but you
> should absolutely upgrade to 0.7.8 for several dozen other fixes,
> including some severe ones -- see NEWS.txt.
>
> On Tue, Aug 2, 2011 at 4:29 PM, Yiming Sun <yiming.sun@gmail.com> wrote:
> > Hi Jeremiah,
> >
> > Thank you for the information - it certainly is a relief.  Two questions
> > though:
> >
> > 1. I came across an old thread which seemed to be saying 0.7.0 cassandra
> has
> > a bug and doesn't remove these compact files properly.  Should we upgrade
> to
> > a newer version that has this bug fixed?
> >
> > 2. Do we must do the garbage collection via Jconsole manually?  Is there
> > anyway I can force the GC in our code? (we are using Hector as our java
> > client).
> >
> > Thanks!
> >
> >
> >
> > On Tue, Aug 2, 2011 at 5:19 PM, Jeremiah Jordan
> > <jeremiah.jordan@morningstar.com> wrote:
> >>
> >> Connect with jconsole and run garbage collection.
> >> All of the files that have a -Compacted with the same name will get
> >> deleted the next time a full garbage collection runs, or when the node
> >> is restarted.  They have already been combined into new files, the old
> >> ones just haven't been deleted yet.
> >>
> >> On Tue, 2011-08-02 at 16:09 -0400, Yiming Sun wrote:
> >> > Hi,
> >> >
> >> > I am new to Cassandra, and am hoping someone could help me understand
> >> > the (large amount of small) data files on disk that Cassandra
> >> > generates.
> >> >
> >> > The reason we are using Cassandra is because we are dealing with
> >> > thousands to millions of small text files on disk, so we are
> >> > experimenting with Cassandra hoping that by dropping the files
> >> > contents into Cassandra, it will achieve more efficient disk usage
> >> > because Cassandra is going to aggregate them into bigger files (one
> >> > file per column family, according to the wiki).
> >> >
> >> > But after we pushed a subset of the files into a single node Cassandra
> >> > v0.7.0 instance, we noted that in the Cassandra data directory for the
> >> > keyspace, there are 8.5 million very small files, most are named
> >> >
> >> >     <SuperColumnFamilyName>-e-<nnnnn>.Filter.db
> >> >     <SuperColumnFamilyName>-e-<nnnnn>.Compacted.db
> >> >     <SuperColumnFamilyName>-e-<nnnnn>.Index.db
> >> >     <SuperColumnFamilyName>-e-<nnnnn>.Statistics.db
> >> >
> >> > and among these files, the Compacted.db are always empty,  Filter and
> >> > Index are under 100 bytes, and Statistics are around 4k.
> >> >
> >> > What are these files? Why are there so many of them?  We originally
> >> > hope that Cassandra was going to solve our issue with the small files
> >> > we have, but now it doesn't seem to help -- we still end up with tons
> >> > of small files.   Is there any way to reduce/combine these small
> >> > files?
> >> >
> >> > Thanks.
> >> >
> >> > -- Y.
> >>
> >
> >
>
>
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of DataStax, the source for professional Cassandra support
> http://www.datastax.com
>

Mime
View raw message