incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marcelo Elias Del Valle <>
Subject io bound model
Date Tue, 26 Nov 2013 13:41:51 GMT
Hi everyone,

    I currently have a column family InputCf in production which has 1 data
input per row. Everytime I receive new data from web, I insert a row in
this CF. Besides that, I have another CF InputCfIndex in which the
year/month/day is my row id (yyyyMMdd) and I insert the id of InputCf on
each column, with no value.
    At the end of the day, I check all the row inserted that day on InputCf
and process it. Reading the id from InputCfIndex is fast, but reading from
InputCf uses a lot of IO, because I cannot know in which machine on the
cluster the data will be. When I query Cassandra for all the rows inserted
today in InputCf, it takes me 100% of Network IO utilization and almost no
cpu or memory consumption.
    I was wondering if there is a way of quering a lot of messages at a
time, but multi_get orchestration happens in the client and as data is
distributed along the cluster, I am not sure it would help.
    So here is my question: any ideas of how to change my model to be able
to query several inputs at a time, consuming less network IO? I am guessing
there must be a way of optimizing it...

Best regards,
Marcelo Elias Del Valle - @mvallebr

View raw message