incubator-cassandra-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Terje Marthinussen <tmarthinus...@gmail.com>
Subject Re: performance with a "large" number of supercolumns/columns
Date Thu, 08 Jul 2010 05:18:24 GMT
Hi,

yes, I have been looking a bit on this, but not as much as I wanted to.

I kind of forgot about the whole double call on first access (probably
something related to caching?) as even fixing that, the performance is not
good enough for what I need.

The hasNext() call causes interactions with both apache common-collections
and google-collection and simply speaking, the amount of code which ends up
being executed for the hasNext()/next() part of this loop is a bit
ridiculous.

I started to optimize some code in common-collections, got a 10-20%
improvement (for my specific test case), but as I started understanding both
the collection code and Cassandra code better  it started dawning on me
that:

1.  There is just too much code and too many "layers" involved when hasNext
is called. I suspect this requires a re-design and the
google/common-collections may have to be thrown out. This would seem to be a
pretty critical area of Cassandra code for performance, so I suspect the
cost of making  functionality specifically tuned for Cassandra should be
worth it.

2. From what I understand, Columns from all SS/MemTables are merged before
considering tombstones and applying the slice predicates. To reduce
deserialization and Column object creation, maybe some of this filtering and
column merging could/should be done before Columns objects are even created?


While this may make things somewhat hairy codewise, JVM performance/memory
consumption tend to not be happy in scenarios with heavy object creation,
and we get a lot of objects here.

3. It would seem like there might be good possibilities to do a fair bit of
optimization for the case where a complete set of columns are fetched
(slices with null for start/end predicates), but I am not currently sure how
much time it would save and if it would be worth the work and added
complexity to the code.

I may make an attempt at 1. but I don't feel entirely like I understand
enough of the Cassandra code yet to do any of the above at the moment so it
may take a little while.

Regards,
Terje

On Thu, Jul 8, 2010 at 8:07 AM, Jonathan Ellis <jbellis@gmail.com> wrote:

> Hi Terje,
>
> Sorry to not get to this sooner.  Are you still looking into this?
>
> On Tue, Jun 22, 2010 at 12:47 PM, Terje Marthinussen
> <tmarthinussen@gmail.com> wrote:
> > Hi,
> >
> > I was looking a bit on a case we have with columnfamily which has 439k
> > supercolumns, each supercolumn with ~3 columns each so a total of some
> 1.3
> > million objects in total.
> >
> > This takes about 9 second to read with thrift on first access, 4-5 second
> on
> > second access.
> >
> > I took a little closer look at this, I noticed that roughly half of this
> > time was spend in cassandra.
> >
> > I will look more at this, but I thought I would just ask people here as
> it
> > could be that someone already had good explanations...
> >
> > Most of the time is spent here
> >
> >    public void collectReducedColumns(IColumnContainer container,
> > Iterator<IColumn> reducedColumns, int gcBefore)
> >    {
> >        int liveColumns = 0;
> >        AbstractType comparator = container.getComparator();
> >
> >        while (reducedColumns.hasNext())
> >        {
> >            if (liveColumns >= count)
> >                break;
> >
> >            IColumn column = reducedColumns.next();
> >            if (logger.isDebugEnabled())
> >                logger.debug(String.format("collecting %s of %s: %s",
> >                                           liveColumns, count,
> > column.getString(comparator)));
> >
> >            if (finish.length > 0
> >                && ((!reversed && comparator.compare(column.name(),
> > finish) > 0))
> >                    || (reversed && comparator.compare(column.name(),
> > finish) < 0))
> >                break;
> >
> >            // only count live columns towards the `count` criteria
> >            if (!column.isMarkedForDelete()
> >                && (!container.isMarkedForDelete()
> >                    || (ClockRelationship.GREATER_THAN ==
> >
> column.mostRecentLiveChangeAt().compare(container.getMarkedForDeleteAt()))))
> >            {
> >                liveColumns++;
> >            }
> >
> >            // but we need to add all non-gc-able columns to the
> > result for read repair:
> >            if (QueryFilter.isRelevant(column, container, gcBefore))
> >                container.addColumn(column);
> >        }
> >    }
> >
> > Adding some time measuring print revealed a few interesting this.
> > 1. First time I request the row (I request the entire row in this case),
> > collectReducedColumns() is called twice. I have not had time to
> understand
> > why yet, but there is one difference between the calls. All columns are
> > returned both times, but the first call is done with MAXINT as "count"
> while
> > the second call has the maxcount actually requested by the client as
> > "count". The first call takes about 3.7 seconds to process, the second
> about
> > 1.7 secs. Whatever reason, I think we should be able to remove one of
> these
> > calls?
> >
> > 2. Almost half the time is spent in "container.addColumn". This is
> probably
> > no big surprise as there is a lot of code hidden "down there". I am
> however
> > very curious if it could not be significantly optimized, especially in
> the
> > case where you have predicates which cases all columns to be included.
> That
> > said, I have not manage to read enough cassandra code to understand
> > tombstones or all the other things which is going on in that part of the
> > code.
> >
> > 3. A bit more of a surprise, most of the remaining 50% of the time seems
> to
> > occur at while (reducedColumns.hasNext())
> > That is, save system.nanoTime() at the end of the loop and after hasNext
> and
> > sum up and it accounts for almost 50% of the total time. That seems quite
> > weird to me.
> >
> > I will dig more, but I was curious if someone had answers already.
> >
> > Best regards,
> > Terje
> >
>
>
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of Riptano, the source for professional Cassandra support
> http://riptano.com
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message