cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mehak Mehta <meme...@cs.stonybrook.edu>
Subject Re: Timeout error in fetching million rows as results using clustering keys
Date Wed, 18 Mar 2015 08:24:13 GMT
We have UI interface which needs this data for rendering.
So efficiency of pulling this data matters a lot. It should be fetched
within a minute.
Is there a way to achieve such efficiency


On Wed, Mar 18, 2015 at 4:06 AM, Ali Akhtar <ali.rac200@gmail.com> wrote:

> Perhaps just fetch them in batches of 1000 or 2000? For 1m rows, it seems
> like the difference would only be a few minutes. Do you have to do this all
> the time, or only once in a while?
>
> On Wed, Mar 18, 2015 at 12:34 PM, Mehak Mehta <memehta@cs.stonybrook.edu>
> wrote:
>
>> yes it works for 1000 but not more than that.
>> How can I fetch all rows using this efficiently?
>>
>> On Wed, Mar 18, 2015 at 3:29 AM, Ali Akhtar <ali.rac200@gmail.com> wrote:
>>
>>> Have you tried a smaller fetch size, such as 5k - 2k ?
>>>
>>> On Wed, Mar 18, 2015 at 12:22 PM, Mehak Mehta <memehta@cs.stonybrook.edu
>>> > wrote:
>>>
>>>> Hi Jens,
>>>>
>>>> I have tried with fetch size of 10000 still its not giving any results.
>>>> My expectations were that Cassandra can handle a million rows easily.
>>>>
>>>> Is there any mistake in the way I am defining the keys or querying them.
>>>>
>>>> Thanks
>>>> Mehak
>>>>
>>>> On Wed, Mar 18, 2015 at 3:02 AM, Jens Rantil <jens.rantil@tink.se>
>>>> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> Try setting fetchsize before querying. Assuming you don't set it too
>>>>> high, and you don't have too many tombstones, that should do it.
>>>>>
>>>>> Cheers,
>>>>> Jens
>>>>>
>>>>> –
>>>>> Skickat från Mailbox <https://www.dropbox.com/mailbox>
>>>>>
>>>>>
>>>>> On Wed, Mar 18, 2015 at 2:58 AM, Mehak Mehta <
>>>>> memehta@cs.stonybrook.edu> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I have requirement to fetch million row as result of my query which
>>>>>> is giving timeout errors.
>>>>>> I am fetching results by selecting clustering columns, then why the
>>>>>> queries are taking so long. I can change the timeout settings but
I need
>>>>>> the data to fetched faster as per my requirement.
>>>>>>
>>>>>> My table definition is:
>>>>>> *CREATE TABLE images.results (uuid uuid, analysis_execution_id
>>>>>> varchar, analysis_execution_uuid uuid, x  double, y double, loc varchar,
w
>>>>>> double, h double, normalized varchar, type varchar, filehost varchar,
>>>>>> filename varchar, image_uuid uuid, image_uri varchar, image_caseid
varchar,
>>>>>> image_mpp_x double, image_mpp_y double, image_width double, image_height
>>>>>> double, objective double, cancer_type varchar,  Area float, submit_date
>>>>>> timestamp, points list<double>,  PRIMARY KEY ((image_caseid),Area,uuid));*
>>>>>>
>>>>>> Here each row is uniquely identified on the basis of unique uuid.
But
>>>>>> since my data is generally queried based upon *image_caseid *I have
>>>>>> made it partition key.
>>>>>> I am currently using Java Datastax api to fetch the results. But
the
>>>>>> query is taking a lot of time resulting in timeout errors:
>>>>>>
>>>>>>  Exception in thread "main"
>>>>>> com.datastax.driver.core.exceptions.NoHostAvailableException: All
host(s)
>>>>>> tried for query failed (tried: localhost/127.0.0.1:9042
>>>>>> (com.datastax.driver.core.exceptions.DriverException: Timed out waiting
for
>>>>>> server response))
>>>>>>  at
>>>>>> com.datastax.driver.core.exceptions.NoHostAvailableException.copy(NoHostAvailableException.java:84)
>>>>>>  at
>>>>>> com.datastax.driver.core.DefaultResultSetFuture.extractCauseFromExecutionException(DefaultResultSetFuture.java:289)
>>>>>>  at
>>>>>> com.datastax.driver.core.DefaultResultSetFuture.getUninterruptibly(DefaultResultSetFuture.java:205)
>>>>>>  at
>>>>>> com.datastax.driver.core.AbstractSession.execute(AbstractSession.java:52)
>>>>>>  at QueryDB.queryArea(TestQuery.java:59)
>>>>>>  at TestQuery.main(TestQuery.java:35)
>>>>>> Caused by:
>>>>>> com.datastax.driver.core.exceptions.NoHostAvailableException: All
host(s)
>>>>>> tried for query failed (tried: localhost/127.0.0.1:9042
>>>>>> (com.datastax.driver.core.exceptions.DriverException: Timed out waiting
for
>>>>>> server response))
>>>>>>  at
>>>>>> com.datastax.driver.core.RequestHandler.sendRequest(RequestHandler.java:108)
>>>>>>  at
>>>>>> com.datastax.driver.core.RequestHandler$1.run(RequestHandler.java:179)
>>>>>>  at
>>>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>>>>  at
>>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>>>>  at java.lang.Thread.run(Thread.java:744)
>>>>>>
>>>>>> Also when I try the same query on console even while using limit
of
>>>>>> 2000 rows:
>>>>>>
>>>>>> cqlsh:images> select count(*) from results where
>>>>>> image_caseid='TCGA-HN-A2NL-01Z-00-DX1' and Area<100 and Area>20
limit 2000;
>>>>>> errors={}, last_host=127.0.0.1
>>>>>>
>>>>>> Thanks and Regards,
>>>>>> Mehak
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Mime
View raw message