incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adam Venturella <aventure...@gmail.com>
Subject Using WHERE IN with Wide Rows
Date Fri, 08 Mar 2013 13:49:18 GMT
TL;DR:
Is it possible to use WHERE IN on wide rows but only have it return the 1st
column of each of the rows in the IN()?

First, I am aware that WHERE IN (id1, id2, id3...N) is not the most
performant, and should not be used on large sets.

Assuming there is also little difference from just issuing N SELECTs from
the requesting application. I'm guessing Cassandra may try to perform some
optimization on it's end, parallelizing the requests to the nodes if
applicable? Otherwise probably, generally speaking, it's more or less the
same-ish as issuing multiple SELECTs.

That said, I need to extract some data, and WHERE IN() is looking like the
best way to do it given that I have the row keys and just need the data.

I have a few thousand id's and figure the best way to grab that info is in
10 id blocks so as not to abuse WHERE IN: IN (1...10), IN(11...20). Now
maybe issuing 100's of WHERE IN's is itself being abusive; my ignorance
shows though. Regardless, I still need to get some data out =)

The next catch is the rows identified by the keys are wide rows (time
series). Assuming each row is a minimum of 100 columns wide issuing the
WHERE IN seems to pull back all of the columns for each row key specified,
as expected.

So my question. Is it possible to use WHERE IN on wide rows but only have
it return the 1st column of each of the rows in the IN()?

I can also just issue SELECTs per row key as well, but I thought I would
ask to see if there was something I was missing using WHERE IN.

Mime
View raw message