From user-return-75233-archive-asf-public=cust-asf.ponee.io@spark.apache.org Sun Jun 10 07:12:58 2018 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx-eu-01.ponee.io (Postfix) with SMTP id 8571A18062B for ; Sun, 10 Jun 2018 07:12:57 +0200 (CEST) Received: (qmail 10143 invoked by uid 500); 10 Jun 2018 05:12:54 -0000 Mailing-List: contact user-help@spark.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list user@spark.apache.org Received: (qmail 10133 invoked by uid 99); 10 Jun 2018 05:12:54 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 10 Jun 2018 05:12:54 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 348AB18035C for ; Sun, 10 Jun 2018 05:12:53 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.889 X-Spam-Level: * X-Spam-Status: No, score=1.889 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, SPF_PASS=-0.001, T_RP_MATCHES_RCVD=-0.01] autolearn=disabled Authentication-Results: spamd3-us-west.apache.org (amavisd-new); domainkeys=pass (768-bit key) header.from=onmstester@zoho.com header.d=zoho.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id 3kkOWejpw57y for ; Sun, 10 Jun 2018 05:12:52 +0000 (UTC) Received: from sender-pp-092.zoho.com (sender-pp-092.zoho.com [135.84.80.237]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id DEE4A5F1F3 for ; Sun, 10 Jun 2018 05:12:51 +0000 (UTC) DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=zapps768; d=zoho.com; h=date:from:to:message-id:subject:mime-version:content-type:user-agent; b=V2gDQVBPfynHnnxi9E8Y1OEymhqJsodNRYkFr8J5BEjXNnn54nN17OIPn1re0Nvwjight6uOvJVE 1VYPe1QQu9NVvrKCPThGMVo5iRtUiYwrZXt+YF8t4ixJPJLwu5we Received: from mail.zoho.com by mx.zohomail.com with SMTP id 1528607564343138.2734749727241; Sat, 9 Jun 2018 22:12:44 -0700 (PDT) Received: from [65.49.68.196] by mail.zoho.com with HTTP;Sat, 9 Jun 2018 22:12:44 -0700 (PDT) Date: Sun, 10 Jun 2018 09:42:44 +0430 From: onmstester onmstester To: "user" Message-Id: <163e81c6235.ef51b34532035.7332459100693272684@zoho.com> Subject: spark optimized pagination MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_Part_86081_1119387935.1528607564341" X-Priority: Medium User-Agent: Zoho Mail X-Mailer: Zoho Mail ------=_Part_86081_1119387935.1528607564341 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit Hi, I'm using spark on top of cassandra as backend CRUD of a Restfull Application. Most of Rest API's retrieve huge amount of data from cassandra and doing a lot of aggregation on them in spark which take some seconds. Problem: sometimes the output result would be a big list which make client browser throw stop script, so we should paginate the result at the server-side, but it would be so annoying for user to wait some seconds on each page to cassandra-spark processings, Current Dummy Solution: For now i was thinking about assigning a UUID to each request which would be sent back and forth between server-side and client-side, the first time a rest API invoked, the result would be saved in a temptable and in subsequent similar requests (request for next pages) the result would be fetch from temptable (instead of common flow of retrieve from cassandra + aggregation in spark which would take some time). On memory limit, the old results would be deleted. Is there any built-in clean caching strategy in spark to handle such scenarios? Sent using Zoho Mail ------=_Part_86081_1119387935.1528607564341 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable =
Hi,
I'm using spark on top of cassandra as= backend CRUD of a Restfull Application.
Most of Rest API's r= etrieve huge amount of data from cassandra and doing a lot of aggregation o= n them  in spark which take some seconds.

Problem: sometimes the output result would be a big list which make client= browser throw stop script, so we should paginate the result at the server-= side,
but it would be so annoying for user to wait some secon= ds on each page to cassandra-spark processings,

Current Dummy Solution: For now i was thinking about assigning a UUID to = each request which would be sent back and forth between server-side and cli= ent-side,
the first time a rest API invoked, the result would= be saved in a temptable  and in subsequent similar requests (request = for next pages) the result would be fetch from
temptable (ins= tead of common flow of retrieve from cassandra + aggregation in spark which= would take some time). On memory limit, the old results would be deleted.<= br>

Is there any built-in clean caching strategy i= n spark to handle such scenarios?

Sent using= Zoho Mail



<= /html> ------=_Part_86081_1119387935.1528607564341--