Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id A156F200C5B for ; Thu, 13 Apr 2017 05:38:03 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 9FF3D160BA8; Thu, 13 Apr 2017 03:38:03 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id C0A78160B95 for ; Thu, 13 Apr 2017 05:38:02 +0200 (CEST) Received: (qmail 86816 invoked by uid 500); 13 Apr 2017 03:37:59 -0000 Mailing-List: contact solr-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: solr-user@lucene.apache.org Delivered-To: mailing list solr-user@lucene.apache.org Received: (qmail 86795 invoked by uid 99); 13 Apr 2017 03:37:59 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 13 Apr 2017 03:37:59 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id A13BC1889F7 for ; Thu, 13 Apr 2017 03:37:58 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 0.38 X-Spam-Level: X-Spam-Status: No, score=0.38 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, RCVD_IN_SORBS_SPAM=0.5, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd3-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id 53K1WA9nq2BP for ; Thu, 13 Apr 2017 03:37:55 +0000 (UTC) Received: from mail-lf0-f43.google.com (mail-lf0-f43.google.com [209.85.215.43]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id 6060D5FB5C for ; Thu, 13 Apr 2017 03:37:55 +0000 (UTC) Received: by mail-lf0-f43.google.com with SMTP id h125so23445805lfe.0 for ; Wed, 12 Apr 2017 20:37:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-transfer-encoding; bh=CYDbWIZVPQFMwH049r2ZxClnH0nVMvR0DAOeiC+A1Jk=; b=VYpnaxe7CkX68XCfDHDcT028E3hXL0ix5oRuPrHi25EJNCshaNMgz5MwT5lk9GHakT 4TiK44bHx6BJNYi7viJ6Oq0dI5wpNEAMJsiaF0dh2y7OWdNDYCHKLpEcmLNZnLk7fMGc 7iuvpNue/zqhIsxX7K9rUNjGpsSqywlWbext5zc6KQcZaovXkdbLwxb8Q/wttIYsLJ7A 9KfCgpJ77BRcCVAdXZmJrpa1iy0rup7xAkJdoIvSUQFdMzDVBnYfZpon/CxVWpfZjQFJ fz7i9hfNRIQQus0Mg+k4BQuOcO+m8RMDF+t4oPqAMA/l/MLz1lF+pINqcM4NJDGTF/7m 79WQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:content-transfer-encoding; bh=CYDbWIZVPQFMwH049r2ZxClnH0nVMvR0DAOeiC+A1Jk=; b=o2Bqt89SR0uFNuTAmbcIUS+xBGI11OZvZg1fIUEzYg6w/shw6Ad/ezOHFKRA64d5+b v8PoWmHvfOWfEfdEaDYP38VdM7bXXg6eYecsMnEMLvu6W6hi+ofGI8xp91QEcXGRRxM9 pZvvm0n1OWSDpMdi3rGWn50olDFw7PqCA62evjpyIapcyejOc0GbKx5Y0X4XgDam1GPP XgVPBMaVNLAE2MNceAFR5NtTQxZ5izQH8+M1ZnvTJSSLTb70lJXJBA7t4gkZikFNAkHg 8KTTlHIdb45JUWJ/MXmjb9lmf0yfZSu1iuoAsMNlGxvF/5UjiFtAQp7UlAj05zGsaKcg 6B6Q== X-Gm-Message-State: AN3rC/73Q/IY7x3N5Plcx7BfTKafrFr5RCCoQqdVvGTE8FbMgLjd4sc1 tipFQB0hOiGa02ZdfKk1EZEnZ4lBHzJpiUw= X-Received: by 10.25.212.144 with SMTP id l138mr250181lfg.26.1492054674023; Wed, 12 Apr 2017 20:37:54 -0700 (PDT) MIME-Version: 1.0 Received: by 10.25.233.11 with HTTP; Wed, 12 Apr 2017 20:37:13 -0700 (PDT) In-Reply-To: References: <0e04d6c8-a3cc-3824-542d-e45024d7876a@elyograg.org> From: Erick Erickson Date: Wed, 12 Apr 2017 20:37:13 -0700 Message-ID: Subject: Re: Long GC pauses while reading Solr docs using Cursor approach To: solr-user Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable archived-at: Thu, 13 Apr 2017 03:38:03 -0000 You're missing the point of my comment. Since they already are docValues, you can use the /export functionality to get the results back as a _stream_ and avoid all of the overhead of the aggregator node doing a merge sort and all of that. You'll have to do this from SolrJ, but see CloudSolrStream. You can see examples of its usage in StreamingTest.java. this should 1> complete much, much faster. The design goal is 400K rows/second but YMMV 2> use vastly less memory on your Solr instances. 3> only require _one_ query Best, Erick On Wed, Apr 12, 2017 at 7:36 PM, Shawn Heisey wrote: > On 4/12/2017 5:19 PM, Chetas Joshi wrote: >> I am getting back 100K results per page. >> The fields have docValues enabled and I am getting sorted results based = on "id" and 2 more fields (String: 32 Bytes and Long: 8 Bytes). >> >> I have a solr Cloud of 80 nodes. There will be one shard that will get t= op 100K docs from each shard and apply merge sort. So, the max memory usage= of any shard could be 40 bytes * 100K * 80 =3D 320 MB. Why would heap memo= ry usage shoot up from 8 GB to 17 GB? > > From what I understand, Java overhead for a String object is 56 bytes > above the actual byte size of the string itself. And each character in > the string will be two bytes -- Java uses UTF-16 for character > representation internally. If I'm right about these numbers, it means > that each of those id values will take 120 bytes -- and that doesn't > include the size the actual response (xml, json, etc). > > I don't know what the overhead for a long is, but you can be sure that > it's going to take more than eight bytes total memory usage for each one. > > Then there is overhead for all the Lucene memory structures required to > execute the query and gather results, plus Solr memory structures to > keep track of everything. I have absolutely no idea how much memory > Lucene and Solr use to accomplish a query, but it's not going to be > small when you have 200 million documents per shard. > > Speaking of Solr memory requirements, under normal query circumstances > the aggregating node is going to receive at least 100K results from > *every* shard in the collection, which it will condense down to the > final result with 100K entries. The behavior during a cursor-based > request may be more memory-efficient than what I have described, but I > am unsure whether that is the case. > > If the cursor behavior is not more efficient, then each entry in those > results will contain the uniqueKey value and the score. That's going to > be many megabytes for every shard. If there are 80 shards, it would > probably be over a gigabyte for one request. > > Thanks, > Shawn >