cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mehak Mehta <meme...@cs.stonybrook.edu>
Subject Re: Timeout error in fetching million rows as results using clustering keys
Date Wed, 18 Mar 2015 10:12:24 GMT
ya I have cluster total 10 nodes but I am just testing with one node
currently.
Total data for all nodes will exceed 5 billion rows. But I may have memory
on other nodes.

On Wed, Mar 18, 2015 at 6:06 AM, Ali Akhtar <ali.rac200@gmail.com> wrote:

> 4g also seems small for the kind of load you are trying to handle
> (billions of rows) etc.
>
> I would also try adding more nodes to the cluster.
>
> On Wed, Mar 18, 2015 at 2:53 PM, Ali Akhtar <ali.rac200@gmail.com> wrote:
>
>> Yeah, it may be that the process is being limited by swap. This page:
>>
>>
>> https://gist.github.com/aliakhtar/3649e412787034156cbb#file-cassandra-install-sh-L42
>>
>> Lines 42 - 48 list a few settings that you could try out for increasing /
>> reducing the memory limits (assuming you're on linux).
>>
>> Also, are you using an SSD? If so make sure the IO scheduler is noop or
>> deadline .
>>
>> On Wed, Mar 18, 2015 at 2:48 PM, Mehak Mehta <memehta@cs.stonybrook.edu>
>> wrote:
>>
>>> Currently Cassandra java process is taking 1% of cpu (total 8% is being
>>> used) and 14.3% memory (out of total 4G memory).
>>> As you can see there is not much load from other processes.
>>>
>>> Should I try changing default parameters of memory in Cassandra settings.
>>>
>>> On Wed, Mar 18, 2015 at 5:33 AM, Ali Akhtar <ali.rac200@gmail.com>
>>> wrote:
>>>
>>>> What's your memory / CPU usage at? And how much ram + cpu do you have
>>>> on this server?
>>>>
>>>>
>>>>
>>>> On Wed, Mar 18, 2015 at 2:31 PM, Mehak Mehta <memehta@cs.stonybrook.edu
>>>> > wrote:
>>>>
>>>>> Currently there is only single node which I am calling directly with
>>>>> around 150000 rows. Full data will be in around billions per node.
>>>>> The code is working only for size 100/200. Also the consecutive
>>>>> fetching is taking around 5-10 secs.
>>>>>
>>>>> I have a parallel script which is inserting the data while I am
>>>>> reading it. When I stopped the script it worked for 500/1000 but not
more
>>>>> than that.
>>>>>
>>>>>
>>>>>
>>>>> On Wed, Mar 18, 2015 at 5:08 AM, Ali Akhtar <ali.rac200@gmail.com>
>>>>> wrote:
>>>>>
>>>>>>  If even 500-1000 isn't working, then your cassandra node might not
>>>>>> be up.
>>>>>>
>>>>>> 1) Try running nodetool status from shell on your cassandra server,
>>>>>> make sure the nodes are up.
>>>>>>
>>>>>> 2) Are you calling this on the same server where cassandra is
>>>>>> running? Its trying to connect to localhost . If you're running it
on a
>>>>>> different server, try passing in the direct ip of your cassandra
server.
>>>>>>
>>>>>> On Wed, Mar 18, 2015 at 2:05 PM, Mehak Mehta <
>>>>>> memehta@cs.stonybrook.edu> wrote:
>>>>>>
>>>>>>> Data won't change much but queries will be different.
>>>>>>> I am not working on the rendering tool myself so I don't know
much
>>>>>>> details about it.
>>>>>>>
>>>>>>> Also as suggested by you I tried to fetch data in size of 500
or
>>>>>>> 1000 with java driver auto pagination.
>>>>>>> It fails when the number of records are high (around 100000)
with
>>>>>>> following error:
>>>>>>>
>>>>>>> Exception in thread "main"
>>>>>>> com.datastax.driver.core.exceptions.NoHostAvailableException:
All host(s)
>>>>>>> tried for query failed (tried: localhost/127.0.0.1:9042
>>>>>>> (com.datastax.driver.core.exceptions.DriverException: Timed out
waiting for
>>>>>>> server response))
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Mar 18, 2015 at 4:47 AM, Ali Akhtar <ali.rac200@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> How often does the data change?
>>>>>>>>
>>>>>>>> I would still recommend a caching of some kind, but without
knowing
>>>>>>>> more details (how often the data is changing, what you're
doing with the 1m
>>>>>>>> rows after getting them, etc) I can't recommend a solution.
>>>>>>>>
>>>>>>>> I did see your other thread. I would also vote for elasticsearch
/
>>>>>>>> solr , they are more suited for the kind of analytics you
seem to be doing.
>>>>>>>> Cassandra is more for storing data, it isn't all that great
for complex
>>>>>>>> queries / analytics.
>>>>>>>>
>>>>>>>> If you want to stick to cassandra, you might have better
luck if
>>>>>>>> you made your range columns part of the primary key, so something
like
>>>>>>>> PRIMARY KEY(caseId, x, y)
>>>>>>>>
>>>>>>>> On Wed, Mar 18, 2015 at 1:41 PM, Mehak Mehta <
>>>>>>>> memehta@cs.stonybrook.edu> wrote:
>>>>>>>>
>>>>>>>>> The rendering tool renders a portion a very large image.
It may
>>>>>>>>> fetch different data each time from billions of rows.
>>>>>>>>> So I don't think I can cache such large results. Since
same
>>>>>>>>> results will rarely fetched again.
>>>>>>>>>
>>>>>>>>> Also do you know how I can do 2d range queries using
Cassandra.
>>>>>>>>> Some other users suggested me using Solr.
>>>>>>>>> But is there any way I can achieve that without using
any other
>>>>>>>>> technology.
>>>>>>>>>
>>>>>>>>> On Wed, Mar 18, 2015 at 4:33 AM, Ali Akhtar <ali.rac200@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Sorry, meant to say "that way when you have to render,
you can
>>>>>>>>>> just display the latest cache."
>>>>>>>>>>
>>>>>>>>>> On Wed, Mar 18, 2015 at 1:30 PM, Ali Akhtar <ali.rac200@gmail.com
>>>>>>>>>> > wrote:
>>>>>>>>>>
>>>>>>>>>>> I would probably do this in a background thread
and cache the
>>>>>>>>>>> results, that way when you have to render, you
can just cache the latest
>>>>>>>>>>> results.
>>>>>>>>>>>
>>>>>>>>>>> I don't know why Cassandra can't seem to be able
to fetch large
>>>>>>>>>>> batch sizes, I've also run into these timeouts
but reducing the batch size
>>>>>>>>>>> to 2k seemed to work for me.
>>>>>>>>>>>
>>>>>>>>>>> On Wed, Mar 18, 2015 at 1:24 PM, Mehak Mehta
<
>>>>>>>>>>> memehta@cs.stonybrook.edu> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> We have UI interface which needs this data
for rendering.
>>>>>>>>>>>> So efficiency of pulling this data matters
a lot. It should be
>>>>>>>>>>>> fetched within a minute.
>>>>>>>>>>>> Is there a way to achieve such efficiency
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Wed, Mar 18, 2015 at 4:06 AM, Ali Akhtar
<
>>>>>>>>>>>> ali.rac200@gmail.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Perhaps just fetch them in batches of
1000 or 2000? For 1m
>>>>>>>>>>>>> rows, it seems like the difference would
only be a few minutes. Do you have
>>>>>>>>>>>>> to do this all the time, or only once
in a while?
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Wed, Mar 18, 2015 at 12:34 PM, Mehak
Mehta <
>>>>>>>>>>>>> memehta@cs.stonybrook.edu> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> yes it works for 1000 but not more
than that.
>>>>>>>>>>>>>> How can I fetch all rows using this
efficiently?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Wed, Mar 18, 2015 at 3:29 AM,
Ali Akhtar <
>>>>>>>>>>>>>> ali.rac200@gmail.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Have you tried a smaller fetch
size, such as 5k - 2k ?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Wed, Mar 18, 2015 at 12:22
PM, Mehak Mehta <
>>>>>>>>>>>>>>> memehta@cs.stonybrook.edu>
wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Hi Jens,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I have tried with fetch size
of 10000 still its not giving
>>>>>>>>>>>>>>>> any results.
>>>>>>>>>>>>>>>> My expectations were that
Cassandra can handle a million
>>>>>>>>>>>>>>>> rows easily.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Is there any mistake in the
way I am defining the keys or
>>>>>>>>>>>>>>>> querying them.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Thanks
>>>>>>>>>>>>>>>> Mehak
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Wed, Mar 18, 2015 at 3:02
AM, Jens Rantil <
>>>>>>>>>>>>>>>> jens.rantil@tink.se> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Try setting fetchsize
before querying. Assuming you don't
>>>>>>>>>>>>>>>>> set it too high, and
you don't have too many tombstones, that should do it.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>>>>>> Jens
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> –
>>>>>>>>>>>>>>>>> Skickat från Mailbox
<https://www.dropbox.com/mailbox>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Wed, Mar 18, 2015
at 2:58 AM, Mehak Mehta <
>>>>>>>>>>>>>>>>> memehta@cs.stonybrook.edu>
wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> I have requirement
to fetch million row as result of my
>>>>>>>>>>>>>>>>>> query which is giving
timeout errors.
>>>>>>>>>>>>>>>>>> I am fetching results
by selecting clustering columns,
>>>>>>>>>>>>>>>>>> then why the queries
are taking so long. I can change the timeout settings
>>>>>>>>>>>>>>>>>> but I need the data
to fetched faster as per my requirement.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> My table definition
is:
>>>>>>>>>>>>>>>>>> *CREATE TABLE images.results
(uuid uuid,
>>>>>>>>>>>>>>>>>> analysis_execution_id
varchar, analysis_execution_uuid uuid, x  double, y
>>>>>>>>>>>>>>>>>> double, loc varchar,
w double, h double, normalized varchar, type varchar,
>>>>>>>>>>>>>>>>>> filehost varchar,
filename varchar, image_uuid uuid, image_uri varchar,
>>>>>>>>>>>>>>>>>> image_caseid varchar,
image_mpp_x double, image_mpp_y double, image_width
>>>>>>>>>>>>>>>>>> double, image_height
double, objective double, cancer_type varchar,  Area
>>>>>>>>>>>>>>>>>> float, submit_date
timestamp, points list<double>,  PRIMARY KEY
>>>>>>>>>>>>>>>>>> ((image_caseid),Area,uuid));*
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Here each row is
uniquely identified on the basis of
>>>>>>>>>>>>>>>>>> unique uuid. But
since my data is generally queried based upon *image_caseid
>>>>>>>>>>>>>>>>>> *I have made it partition
key.
>>>>>>>>>>>>>>>>>> I am currently using
Java Datastax api to fetch the
>>>>>>>>>>>>>>>>>> results. But the
query is taking a lot of time resulting in timeout errors:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>  Exception in thread
"main"
>>>>>>>>>>>>>>>>>> com.datastax.driver.core.exceptions.NoHostAvailableException:
All host(s)
>>>>>>>>>>>>>>>>>> tried for query failed
(tried: localhost/127.0.0.1:9042
>>>>>>>>>>>>>>>>>> (com.datastax.driver.core.exceptions.DriverException:
Timed out waiting for
>>>>>>>>>>>>>>>>>> server response))
>>>>>>>>>>>>>>>>>>  at
>>>>>>>>>>>>>>>>>> com.datastax.driver.core.exceptions.NoHostAvailableException.copy(NoHostAvailableException.java:84)
>>>>>>>>>>>>>>>>>>  at
>>>>>>>>>>>>>>>>>> com.datastax.driver.core.DefaultResultSetFuture.extractCauseFromExecutionException(DefaultResultSetFuture.java:289)
>>>>>>>>>>>>>>>>>>  at
>>>>>>>>>>>>>>>>>> com.datastax.driver.core.DefaultResultSetFuture.getUninterruptibly(DefaultResultSetFuture.java:205)
>>>>>>>>>>>>>>>>>>  at
>>>>>>>>>>>>>>>>>> com.datastax.driver.core.AbstractSession.execute(AbstractSession.java:52)
>>>>>>>>>>>>>>>>>>  at QueryDB.queryArea(TestQuery.java:59)
>>>>>>>>>>>>>>>>>>  at TestQuery.main(TestQuery.java:35)
>>>>>>>>>>>>>>>>>> Caused by:
>>>>>>>>>>>>>>>>>> com.datastax.driver.core.exceptions.NoHostAvailableException:
All host(s)
>>>>>>>>>>>>>>>>>> tried for query failed
(tried: localhost/127.0.0.1:9042
>>>>>>>>>>>>>>>>>> (com.datastax.driver.core.exceptions.DriverException:
Timed out waiting for
>>>>>>>>>>>>>>>>>> server response))
>>>>>>>>>>>>>>>>>>  at
>>>>>>>>>>>>>>>>>> com.datastax.driver.core.RequestHandler.sendRequest(RequestHandler.java:108)
>>>>>>>>>>>>>>>>>>  at
>>>>>>>>>>>>>>>>>> com.datastax.driver.core.RequestHandler$1.run(RequestHandler.java:179)
>>>>>>>>>>>>>>>>>>  at
>>>>>>>>>>>>>>>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>>>>>>>>>>>>>>>>  at
>>>>>>>>>>>>>>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>>>>>>>>>>>>>>>>  at java.lang.Thread.run(Thread.java:744)
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Also when I try the
same query on console even while
>>>>>>>>>>>>>>>>>> using limit of 2000
rows:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> cqlsh:images>
select count(*) from results where
>>>>>>>>>>>>>>>>>> image_caseid='TCGA-HN-A2NL-01Z-00-DX1'
and Area<100 and Area>20 limit 2000;
>>>>>>>>>>>>>>>>>> errors={}, last_host=127.0.0.1
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Thanks and Regards,
>>>>>>>>>>>>>>>>>> Mehak
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Mime
View raw message