Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm
Precedence: bulk
Reply-To: java-user@lucene.apache.org
Received-SPF: error (athena.apache.org: local policy)
MIME-Version: 1.0
In-Reply-To: 
 <CAN4YXvfDnaJrJT-jL0bsOcHU2Q+YwDWOQMF-FwKa+AEoqxg5kw@mail.gmail.com>
References: 
 <CAG1=5hE=y3ggV-Egf5s5Tvhr34_vxD4H+86MzrWkt41GuyN3_Q@mail.gmail.com>
	<CAN4YXvfDnaJrJT-jL0bsOcHU2Q+YwDWOQMF-FwKa+AEoqxg5kw@mail.gmail.com>
Date: Wed, 29 Apr 2015 12:05:45 -0400
Message-ID: 
 <CAG1=5hH+ZOs1QR+6Fjq_72Kaje++xiC0f4u_2=EaY5nCs9y5BA@mail.gmail.com>
Subject: Re: custom collector
From: Robust Links <peyman@robustlinks.com>
To: java-user@lucene.apache.org
Content-Type: multipart/alternative; boundary=001a113d5256dbb9f40514df269d

--001a113d5256dbb9f40514df269d
Content-Type: text/plain; charset=UTF-8

Hi Erick

The index I am searching is lucene. I am trying to perform some operations
over ALL the documents in that index. I can rebuild the index as a solr
index and then use the export functionality. Up to now I've been using the
lucene index searcher with custom collector. Would the below code be
correct if I want to continue with lucene path?

thank you Erick

    public class DocIDCollector extends SimpleCollector {


    HashBiMap<Integer,Long> idSet = HashBiMap.create();

    private Scorer scorer;

    private NumericDocValues ids;


    public boolean acceptsDocsOutOfOrder() {

      return true;

    }


    public void setScorer(Scorer scorer) {

      this.scorer = scorer;

    }

    public void doSetNextReader(LeafReaderContext reader)

    throws IOException{

  ids = DocValues.getNumeric(reader.reader(), "id");

    }


    public void collect(int doc) throws IOException {

  long wid = ids.get(doc);

          idSet.put(doc,wid);

    }


    public void reset() {

    idSet.clear();

    }


    public HashBiMap<Integer,Long> getWikiIds() {

      return idSet;

    }

    }

On Wed, Apr 29, 2015 at 11:32 AM, Erick Erickson <erickerickson@gmail.com>
wrote:

> Hmmm, it's not clear to me whether you're using Solr or not, but if
> you are have you considered using the export functionality? This is
> already built to stream large result sets back to the client. And
> lately (5.1), you can combine that with "streaming aggregation" to do
> some pretty cool stuff.
>
> Not sure it applies in your situation as you didn't state the use-case
> but thought I'd at least mention it.
>
> Best,
> Erick
>
> On Wed, Apr 29, 2015 at 7:41 AM, Robust Links <peyman@robustlinks.com>
> wrote:
> > Hi
> >
> > I need help porting my lucene code from 4 to 5. In particular, I need to
> > customize a collector (to collect all doc Ids in the index - which can be
> >>30MM docs..). Below is how I achieved this in lucene 4. Is there some
> > guidelines how to do this in lucene 5, specially on semantics changes of
> > AtomicReaderContext (which seems deprecated) and the new
> LeafReaderContext?
> >
> > thank you in advance
> >
> >
> > public class CustomCollector extends Collector {
> >
> >   private HashSet<String> data = new HashSet<String>();
> >
> > private Scorer scorer;
> >
> >   private int docBase;
> >
> >   private BinaryDocValues dataList;
> >
> >
> >    public boolean acceptsDocsOutOfOrder() {
> >
> >   return true;
> >
> >   }
> >
> >   public void setScorer(Scorer scorer) {
> >
> >   this.scorer = scorer;
> >
> >   }
> >
> >   public void setNextReader(AtomicReaderContext ctx) throws IOException{
> >
> > this.docBase = ctx.docBase;
> >
> > dataList = FieldCache.DEFAULT.getTerms(ctx.reader(),"title",false);
> >
> >   }
> >
> >   public void collect(int doc) throws IOException {
> >
> >   BytesRef t = new BytesRef();
> >
> >   dataList(doc);
> >
> >   if (t.bytes != BytesRef.EMPTY_BYTES && t.bytes !=
> BytesRef.EMPTY_BYTES) {
> >
> >  data((t.utf8ToString()));
> >
> >    }
> >
> >   }
> >
> >   public void reset() {
> >
> >   data.clear();
> >
> >   dataList = null;
> >
> >   }
> >
> >   public HashSet<String> getData() {
> >
> >   return data;
> >
> >   }
> >
> > }
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

--001a113d5256dbb9f40514df269d--