Return-Path: X-Original-To: apmail-lucene-java-user-archive@www.apache.org Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 7056417AF0 for ; Wed, 29 Apr 2015 16:07:26 +0000 (UTC) Received: (qmail 80887 invoked by uid 500); 29 Apr 2015 16:07:24 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 80826 invoked by uid 500); 29 Apr 2015 16:07:24 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 80814 invoked by uid 99); 29 Apr 2015 16:07:24 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 29 Apr 2015 16:07:24 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE X-Spam-Check-By: apache.org Received-SPF: error (athena.apache.org: local policy) Received: from [54.191.145.13] (HELO mx1-us-west.apache.org) (54.191.145.13) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 29 Apr 2015 16:07:17 +0000 Received: from mail-ob0-f179.google.com (mail-ob0-f179.google.com [209.85.214.179]) by mx1-us-west.apache.org (ASF Mail Server at mx1-us-west.apache.org) with ESMTPS id 925A524F15 for ; Wed, 29 Apr 2015 16:06:37 +0000 (UTC) Received: by oblw8 with SMTP id w8so23644574obl.0 for ; Wed, 29 Apr 2015 09:05:45 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:content-type; bh=cS5NV5vG17qI3oTuI/en86zlpfcDE1zHKMapkbNmIe4=; b=KvKzlOakK/MHYpfA2A4KdvoeUXWNBE7pf6Q3i1zJJbZrLwthSgHkUs3vBFBxofpSYm 9SO5Me/ZRvy7C0yMxJ62Qwi+S7aYj+FpRy+r/o3nClzYDi/Wf8uV8bZMehBOoGZuyM86 oBiPecxmZWp/k4xlcSG5iyP89xEyc2tkrZyoZ7KMfPEsAYwYqTAiTZKtvGbd5T2XSpj3 XZAQj3bs/cZO8i+GalKBartljo8tQQwuUf3MxMflUnsUx9PbQd1xBo6ViPbyZblGHcKo htFSAPXoZYA6HKDItTSEl0mLY7lHUNWZYBfCEq+Grn4cs2utDubpYTQ0vEp6q0nikS1z OUeQ== X-Gm-Message-State: ALoCoQk5ubnBvUqr2gZu6Edi5xRlA+VIDYVzvTiWOe+BMwaSFzH8+cwPjEWBDPsrdStCeNi5uram MIME-Version: 1.0 X-Received: by 10.202.220.135 with SMTP id t129mr18287313oig.115.1430323545734; Wed, 29 Apr 2015 09:05:45 -0700 (PDT) Received: by 10.202.219.86 with HTTP; Wed, 29 Apr 2015 09:05:45 -0700 (PDT) In-Reply-To: References: Date: Wed, 29 Apr 2015 12:05:45 -0400 Message-ID: Subject: Re: custom collector From: Robust Links To: java-user@lucene.apache.org Content-Type: multipart/alternative; boundary=001a113d5256dbb9f40514df269d X-Virus-Checked: Checked by ClamAV on apache.org --001a113d5256dbb9f40514df269d Content-Type: text/plain; charset=UTF-8 Hi Erick The index I am searching is lucene. I am trying to perform some operations over ALL the documents in that index. I can rebuild the index as a solr index and then use the export functionality. Up to now I've been using the lucene index searcher with custom collector. Would the below code be correct if I want to continue with lucene path? thank you Erick public class DocIDCollector extends SimpleCollector { HashBiMap idSet = HashBiMap.create(); private Scorer scorer; private NumericDocValues ids; public boolean acceptsDocsOutOfOrder() { return true; } public void setScorer(Scorer scorer) { this.scorer = scorer; } public void doSetNextReader(LeafReaderContext reader) throws IOException{ ids = DocValues.getNumeric(reader.reader(), "id"); } public void collect(int doc) throws IOException { long wid = ids.get(doc); idSet.put(doc,wid); } public void reset() { idSet.clear(); } public HashBiMap getWikiIds() { return idSet; } } On Wed, Apr 29, 2015 at 11:32 AM, Erick Erickson wrote: > Hmmm, it's not clear to me whether you're using Solr or not, but if > you are have you considered using the export functionality? This is > already built to stream large result sets back to the client. And > lately (5.1), you can combine that with "streaming aggregation" to do > some pretty cool stuff. > > Not sure it applies in your situation as you didn't state the use-case > but thought I'd at least mention it. > > Best, > Erick > > On Wed, Apr 29, 2015 at 7:41 AM, Robust Links > wrote: > > Hi > > > > I need help porting my lucene code from 4 to 5. In particular, I need to > > customize a collector (to collect all doc Ids in the index - which can be > >>30MM docs..). Below is how I achieved this in lucene 4. Is there some > > guidelines how to do this in lucene 5, specially on semantics changes of > > AtomicReaderContext (which seems deprecated) and the new > LeafReaderContext? > > > > thank you in advance > > > > > > public class CustomCollector extends Collector { > > > > private HashSet data = new HashSet(); > > > > private Scorer scorer; > > > > private int docBase; > > > > private BinaryDocValues dataList; > > > > > > public boolean acceptsDocsOutOfOrder() { > > > > return true; > > > > } > > > > public void setScorer(Scorer scorer) { > > > > this.scorer = scorer; > > > > } > > > > public void setNextReader(AtomicReaderContext ctx) throws IOException{ > > > > this.docBase = ctx.docBase; > > > > dataList = FieldCache.DEFAULT.getTerms(ctx.reader(),"title",false); > > > > } > > > > public void collect(int doc) throws IOException { > > > > BytesRef t = new BytesRef(); > > > > dataList(doc); > > > > if (t.bytes != BytesRef.EMPTY_BYTES && t.bytes != > BytesRef.EMPTY_BYTES) { > > > > data((t.utf8ToString())); > > > > } > > > > } > > > > public void reset() { > > > > data.clear(); > > > > dataList = null; > > > > } > > > > public HashSet getData() { > > > > return data; > > > > } > > > > } > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-user-help@lucene.apache.org > > --001a113d5256dbb9f40514df269d--