spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From onmstester onmstester <>
Subject spark optimized pagination
Date Sun, 10 Jun 2018 05:12:44 GMT

I'm using spark on top of cassandra as backend CRUD of a Restfull Application.

Most of Rest API's retrieve huge amount of data from cassandra and doing a lot of aggregation
on them  in spark which take some seconds.

Problem: sometimes the output result would be a big list which make client browser throw stop
script, so we should paginate the result at the server-side,

but it would be so annoying for user to wait some seconds on each page to cassandra-spark

Current Dummy Solution: For now i was thinking about assigning a UUID to each request which
would be sent back and forth between server-side and client-side,

the first time a rest API invoked, the result would be saved in a temptable  and in subsequent
similar requests (request for next pages) the result would be fetch from

temptable (instead of common flow of retrieve from cassandra + aggregation in spark which
would take some time). On memory limit, the old results would be deleted.

Is there any built-in clean caching strategy in spark to handle such scenarios?

Sent using Zoho Mail

View raw message