Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 68A33200C7C for ; Sun, 7 May 2017 02:49:27 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 5C3C2160BBC; Sun, 7 May 2017 00:49:27 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id A22C0160BAD for ; Sun, 7 May 2017 02:49:26 +0200 (CEST) Received: (qmail 56336 invoked by uid 500); 7 May 2017 00:49:24 -0000 Mailing-List: contact solr-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: solr-user@lucene.apache.org Delivered-To: mailing list solr-user@lucene.apache.org Received: (qmail 56322 invoked by uid 99); 7 May 2017 00:49:24 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 07 May 2017 00:49:24 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 2FFD31A0465 for ; Sun, 7 May 2017 00:49:24 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -0.397 X-Spam-Level: X-Spam-Status: No, score=-0.397 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-2.796, RCVD_IN_SORBS_SPAM=0.5, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd2-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id rvThEOJkXvUJ for ; Sun, 7 May 2017 00:49:21 +0000 (UTC) Received: from mail-io0-f178.google.com (mail-io0-f178.google.com [209.85.223.178]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id 0F4B95F397 for ; Sun, 7 May 2017 00:49:21 +0000 (UTC) Received: by mail-io0-f178.google.com with SMTP id k91so33653114ioi.1 for ; Sat, 06 May 2017 17:49:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to; bh=UBRsCm18o+MnGdCVxCHWqU/ldAfMUA90rI6iL0tbdzs=; b=mnocnyYnBnbFRoQ1FXz+6jdSXp2BhXFsfcceJohywPUYILL6KRk+zq3fWR8u7LxJpJ Qa/xq2GCwueDid+FxfF2wfE/Txoyhdfql9vXNExj+9j7LMQ5Pko+J2tBzV8pUVeJsMGQ jrNaruWySXBTeJRlrLPhKh3Cl9uojQielWYfyC43KY+CF4M5x0O8zZsM3a/sZWU3ckrz p7tcfOM/BCZDs3MI64EGr+VW0Kf5Yqs7qVK8S1x9+71r/xMWIbCOYUYYxy/73uXp/394 5mqrg0U7yoYxcaoCcgwbXjiWTEefwKQQXj/GQmyPgf95tdd++mSn8PQq4iWNF7ROrw12 TcFg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to; bh=UBRsCm18o+MnGdCVxCHWqU/ldAfMUA90rI6iL0tbdzs=; b=mw+wC3x/MbQXLigeLqtn8G7uDZAW+ssELm1d0wqHNHaWJhiZk7rpJCO6mnPCt0D0QB T0gpWjdIxyy9Dv4iJTBoopiS+bZUojLwqTqN1gXYDMJ5Df+h6jAfFjSjQLvtJo5wM0t1 Xdk/bw9h7M9tH2S4oEVybqZzaQQFKtsA4ZO0jNnNEoiQXeMWqHTcvh7aL4jx8zeq7nky RaFQGbr54kh5jKEfNBRSCztiONMERgZih5eIt2As+AIkJD8mf27MrcnNhB6xr8a10qEE 2PyHPsMsLqo0raf6/zlFNlYklyd/m/INFF4J3GwhNsViCneGxaJ1QkkbIsdg72+/N9mC SIwQ== X-Gm-Message-State: AODbwcDo1Z5VuxziPDdLQTKQmPkv5+bvrIwag00ehArVXUSMhvxe+HnS Ni2E104rWm/p3O/mt/n12iak8v8pH1hG4dU= X-Received: by 10.107.12.143 with SMTP id 15mr6410383iom.94.1494118159809; Sat, 06 May 2017 17:49:19 -0700 (PDT) MIME-Version: 1.0 Received: by 10.79.89.133 with HTTP; Sat, 6 May 2017 17:49:19 -0700 (PDT) In-Reply-To: <4fa9039b-e0f4-a999-764b-6194890a8f24@elyograg.org> References: <0B13C41C-11F9-44F0-8FD9-D70C1BE1C4FB@leirtech.com> <4fa9039b-e0f4-a999-764b-6194890a8f24@elyograg.org> From: Zheng Lin Edwin Yeo Date: Sun, 7 May 2017 08:49:19 +0800 Message-ID: Subject: Re: Slow indexing speed when collection size is large To: solr-user@lucene.apache.org Content-Type: multipart/alternative; boundary=001a113ee04a2b014b054ee47e15 archived-at: Sun, 07 May 2017 00:49:27 -0000 --001a113ee04a2b014b054ee47e15 Content-Type: text/plain; charset=UTF-8 Hi Shawn, For my rich documentation handling, I'm using Extracting Request Handler, and it requires OCR. However, currently, for the slow indexing speed which I'm experiencing, the indexing is done directly from the Sybase database. I will fetch about 1000 records at a time from Sybase, and stored in into a CacheRowSet for it to be indexed. The query to the Sybase database is quite fast, and most of the time is spend on processes in the CacheRowSet. Here are the answers to the other questions: On a single Solr server, how much total memory is installed? A) 384 GB What is the total amount of memory reserved for Solr heaps on that server? A) 22 GB What is the total on-disk size of all the Solr indexes on that server? A) 5 TB -- Multiple replicas must be included if they are present on one machine. From the core (shard replica) perspective, how many documents are on that server? A) About 200 million documents for both replica. Each replica is about 100 million. Currently, both replicas are in the same server, but different disk. -- Multiple replicas must be included here too. Is there software other than the Solr server process(es) running on that server? A) A virtual machine with Sybase database is running on the server Are you making queries at the same time you're indexing? A) Only occasionally. Most of the time, there is no queries made. Regards, Edwin On 6 May 2017 at 20:41, Shawn Heisey wrote: > On 5/1/2017 10:17 AM, Zheng Lin Edwin Yeo wrote: > > I'm using Solrj for the indexing, not using curl. Normally I bundle > > about 1000 documents for each POST. There's more than 300GB of RAM for > > that server, and I do not use any sharing at the moment. > > Looking over your email history on the list, I was able to determine > some information, but not everything I was wondering about. I have some > questions. > > Are you still using the Extracting Request Handler for your rich > document handling, or have you moved Tika processing outside Solr? > If it's outside Solr, is it on different machines? > Are your rich documents still requiring OCR? > > Other questions: > > On a single Solr server, how much total memory is installed? > What is the total amount of memory reserved for Solr heaps on that server? > What is the total on-disk size of all the Solr indexes on that server? > -- Multiple replicas must be included if they are present on one machine. > From the core (shard replica) perspective, how many documents are on > that server? > -- Multiple replicas must be included here too. > Is there software other than the Solr server process(es) running on that > server? > Are you making queries at the same time you're indexing? > > Thanks, > Shawn > > --001a113ee04a2b014b054ee47e15--