incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sylvain Lebresne <sylv...@datastax.com>
Subject Re: What is the fastest way to get data into Cassandra 2 from a Java application?
Date Wed, 11 Dec 2013 12:40:49 GMT
Then I suspect that this is artifact of your test methodology. Prepared
statements *are* faster than non prepared ones in general. They save some
parsing and some bytes on the wire. The savings will tend to be bigger for
bigger queries, and it's possible that for very small queries (like the one
you
are testing) the performance difference is somewhat negligible, but seeing
non
prepared statement being significantly faster than prepared ones almost
surely
means you're doing wrong (of course, a bug in either the driver or C* is
always
possible, and always make sure to test recent versions, but I'm not aware of
any such bug).

Are you sure you are warming up the JVMs (client and drivers) properly for
instance. 1000 iterations is *really small*, if you're not warming things
up properly, you're not measuring anything relevant. Also, are you including
the preparation of the query itself in the timing? Preparing a query is not
particulary fast, but it's meant to be done just once at the begining of the
application lifetime. But with only 1000 iterations, if you include the
preparation in the timing, it's entirely possible it's eating a good chunk
of
the whole time.

But other prepared versus non-prepared, you won't get proper performance
unless
you parallelize your inserts. Unlogged batches is one way to do it (it's
really
all Cassandra does with unlogged batch, parallelizing). But as John Sanda
mentioned, another option is to do the parallelization client side, with
executeAsync.

--
Sylvain



On Wed, Dec 11, 2013 at 11:37 AM, David Tinker <david.tinker@gmail.com>wrote:

> Yes thats what I found.
>
> This is faster:
>
> for (int i = 0; i < 1000; i++) session.execute("INSERT INTO
> test.wibble (id, info) VALUES ('${"" + i}', '${"aa" + i}')")
>
> Than this:
>
> def ps = session.prepare("INSERT INTO test.wibble (id, info) VALUES (?,
> ?)")
> for (int i = 0; i < 1000; i++) session.execute(ps.bind(["" + i, "aa" +
> i] as Object[]))
>
> This is the fastest option of all (hand rolled batch):
>
> StringBuilder b = new StringBuilder()
> b.append("BEGIN UNLOGGED BATCH\n")
> for (int i = 0; i < 1000; i++) {
>     b.append("INSERT INTO ").append(ks).append(".wibble (id, info)
> VALUES ('").append(i).append("','")
>             .append("aa").append(i).append("')\n")
> }
> b.append("APPLY BATCH\n")
> session.execute(b.toString())
>
>
> On Wed, Dec 11, 2013 at 10:56 AM, Sylvain Lebresne <sylvain@datastax.com>
> wrote:
> >
> >> This loop takes 2500ms or so on my test cluster:
> >>
> >> PreparedStatement ps = session.prepare("INSERT INTO perf_test.wibble
> >> (id, info) VALUES (?, ?)")
> >> for (int i = 0; i < 1000; i++) session.execute(ps.bind("" + i, "aa" +
> i));
> >>
> >> The same loop with the parameters inline is about 1300ms. It gets
> >> worse if there are many parameters.
> >
> >
> > Do you mean that:
> >   for (int i = 0; i < 1000; i++)
> >       session.execute("INSERT INTO perf_test.wibble (id, info) VALUES ("
> + i
> > + ", aa" + i + ")");
> > is twice as fast as using a prepared statement? And that the difference
> > is even greater if you add more columns than "id" and "info"?
> >
> > That would certainly be unexpected, are you sure you're not re-preparing
> the
> > statement every time in the loop?
> >
> > --
> > Sylvain
> >
> >> I know I can use batching to
> >> insert all the rows at once but thats not the purpose of this test. I
> >> also tried using session.execute(cql, params) and it is faster but
> >> still doesn't match inline values.
> >>
> >> Composing CQL strings is certainly convenient and simple but is there
> >> a much faster way?
> >>
> >> Thanks
> >> David
> >>
> >> I have also posted this on Stackoverflow if anyone wants the points:
> >>
> >>
> http://stackoverflow.com/questions/20491090/what-is-the-fastest-way-to-get-data-into-cassandra-2-from-a-java-application
> >
> >
>
>
>
> --
> http://qdb.io/ Persistent Message Queues With Replay and #RabbitMQ
> Integration
>

Mime
View raw message