Return-Path: X-Original-To: apmail-lucene-general-archive@www.apache.org Delivered-To: apmail-lucene-general-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 66B6C19C98 for ; Tue, 15 Mar 2016 17:53:10 +0000 (UTC) Received: (qmail 40106 invoked by uid 500); 15 Mar 2016 17:53:04 -0000 Delivered-To: apmail-lucene-general-archive@lucene.apache.org Received: (qmail 40050 invoked by uid 500); 15 Mar 2016 17:53:04 -0000 Mailing-List: contact general-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: general@lucene.apache.org Delivered-To: mailing list general@lucene.apache.org Received: (qmail 40031 invoked by uid 99); 15 Mar 2016 17:53:04 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 15 Mar 2016 17:53:04 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 12A6E1800ED for ; Tue, 15 Mar 2016 17:53:04 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.198 X-Spam-Level: * X-Spam-Status: No, score=1.198 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H2=-0.001, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd3-us-west.apache.org (amavisd-new); dkim=pass (1024-bit key) header.d=snapdeal.com Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id gUuMPyw5l4JZ for ; Tue, 15 Mar 2016 17:53:01 +0000 (UTC) Received: from mail-oi0-f43.google.com (mail-oi0-f43.google.com [209.85.218.43]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id E98E85F19D for ; Tue, 15 Mar 2016 17:53:00 +0000 (UTC) Received: by mail-oi0-f43.google.com with SMTP id c203so19231517oia.2 for ; Tue, 15 Mar 2016 10:53:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=snapdeal.com; s=google; h=mime-version:in-reply-to:references:date:message-id:subject:from:to; bh=N0AQ0DB8wLYQTdMNk67XnsWybDUmeR83udNwUy9jpc0=; b=avR4KhHl1gGptVE2RS/l3k9GdbkG2tA33EhT+jxYbc+kET0Eus/Svil6t032vWwsjJ omy9YU8WVImZ4W38KGgzIP1xE77PEZkodLNYZrqQtQk99u/ULLr9Lxshye9OTLSbVcB/ MHNc5xNrhVSKowglTxmnUsweZbJ/rFA6gt08U= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to; bh=N0AQ0DB8wLYQTdMNk67XnsWybDUmeR83udNwUy9jpc0=; b=ajjDRlgNhWMLfbNbKKMCjRCyaXiZu/05O+aERX75xWI1eT/vE1N0U2StwxeOkDrMJ/ McBVpSxIbdadUpsnvwwzOQyPBjGoU26b3wpAdrhTl+jsgM2jE2SRQTOhEojCJGmpE2du 5nhmJAuuEXsj8Xspl/KN0mUTAHu3AlqgjHy5pDGz7Cjv7VjrzigwLjxU6VVzLOULH2Xd OjubUXS0HK+OCZWrCfAAPpO2lyjWSjljb3BjRZPEWIZNTKAVeZuScbwQxgQujQgsEb/x m1V9Gjp2tgr0SCHNsmDO1KKztbKIR+Epb9QpRvvLlRl6rFxEvNIMh87FWEiP0JXh6Y3j aPpg== X-Gm-Message-State: AD7BkJKzfCNyph16PeX6Gj4+V7qHQj05U+iJklvmt9/S+PoXnAH2jVaqT1acb5XqxbBvksIA4/Ei5wkQ0jgwzfL5 MIME-Version: 1.0 X-Received: by 10.202.106.196 with SMTP id f187mr17536562oic.1.1458064373884; Tue, 15 Mar 2016 10:52:53 -0700 (PDT) Received: by 10.202.204.151 with HTTP; Tue, 15 Mar 2016 10:52:53 -0700 (PDT) In-Reply-To: References: Date: Tue, 15 Mar 2016 23:22:53 +0530 Message-ID: Subject: Re: solr 4.7 MultiFields and MultiDocValues slow From: Rahul Kumar To: general@lucene.apache.org Content-Type: multipart/alternative; boundary=001a1141b95610c904052e1a1172 --001a1141b95610c904052e1a1172 Content-Type: text/plain; charset=UTF-8 Yeah, I already followed this approach. Can you tell me why is this speed-up ? As per my understanding, iterating through top level reader, there is a binary search to lookup segment readers first and then they are called to fetch the norm data. So iterating through segment readers. We save up on that binary search for sub-segment readers. Also is there any other way to access the norm values ? Rahul Kumar *Software Engineer- I (Search)* *M*: +91 9023542950 *EXT: *14226 362-363, ASF CENTRE , UDYOG VIHAR , PHASE - IV , GURGAON 122 016 , INDIA On Tue, Mar 15, 2016 at 7:26 PM, David Smiley wrote: > Basically, ideally you can do what you need to do by first iterating over > the LeafReaders and working with each there. If you can do that, then you > don't need SlowCompositeReaderWrapper and the overhead it introduces via > its Multi* classes. Very few tasks require SCRW. Dumping the index to a > JSON format shouldn't require SCRW. > ~ David > > On Mon, Mar 7, 2016 at 4:49 AM Rahul Kumar > wrote: > > > Hello everyone, > > I am using solr 4.7.2 . I am somewhat new to solr and want to dump solr > > indexes to json format. To check for deleted docs I have used > > *Bits liveDocs = MultiFields.getLiveDocs(reader);* > > I also want to get field Boosts for all documents and for that I have > used > > *NumericDocValues ndv = MultiDocValues.getNormValues(reader, field.name > > ());* > > > > *The documentation of these methods states that they are both quite > > expensive and slow *as they merge individual sub-segment readers. The doc > > recommends to write these implementations yourself. Can someone please > > explain why will my implementation be fast as I will also have to merge > the > > segment readers as I want to have the info for all documents. Or Can > anyone > > suggest an optimal way to implement these methods. Any help is highly > > appreciated. > > -- > > Thanks and Regards > > Rahul Jha > > > -- > Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker > LinkedIn: http://linkedin.com/in/davidwsmiley | Book: > http://www.solrenterprisesearchserver.com > --001a1141b95610c904052e1a1172--