Return-Path: X-Original-To: apmail-lucene-java-user-archive@www.apache.org Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 26B6110CEF for ; Wed, 27 Nov 2013 16:47:08 +0000 (UTC) Received: (qmail 29517 invoked by uid 500); 27 Nov 2013 16:46:32 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 29401 invoked by uid 500); 27 Nov 2013 16:46:28 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 29292 invoked by uid 99); 27 Nov 2013 16:46:25 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 27 Nov 2013 16:46:25 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of michael.berkovsky@gmail.com designates 209.85.192.169 as permitted sender) Received: from [209.85.192.169] (HELO mail-pd0-f169.google.com) (209.85.192.169) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 27 Nov 2013 16:46:19 +0000 Received: by mail-pd0-f169.google.com with SMTP id v10so10314969pde.0 for ; Wed, 27 Nov 2013 08:45:58 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=3BvALY5YY0mk0KelHLhwpF1Oy5c+jQgiXA+caRPSY5E=; b=afd8C5Fg1XFYKUpg4WSAh8M1t9na/MpmgUKbuKwOXLkC3cTKaS92hTLjHo5Gbw4bLu 9uh5woH9IwLN3eow8s8a6YtkJPL1s5GCHbotC1giwbQ+jPim6cZ1bjqZDGSJELM0nPNZ WnyjLup0Ml8Y5HU6gCIvPY0/bv4hJLCgPCmotiyTacrURssyO7jxyjMIyKAl35Bl21CE cg1LDiRfwfn+HEqWE6K44XNItelPSwXUD0QOLu1zon0FxdsKMNvTACvFTL0jHg7/+h1H 7Q3/tDt1nLhozre+WTAKpCk47+lego9OMoBgtDmrcB1JRgEM2aH0Uo1CXlAeUJdXZLlR 1GOQ== MIME-Version: 1.0 X-Received: by 10.68.196.3 with SMTP id ii3mr6025618pbc.160.1385570758027; Wed, 27 Nov 2013 08:45:58 -0800 (PST) Received: by 10.70.130.73 with HTTP; Wed, 27 Nov 2013 08:45:57 -0800 (PST) In-Reply-To: References: Date: Wed, 27 Nov 2013 08:45:57 -0800 Message-ID: Subject: Re: Scanning through inverted index From: Michael Berkovsky To: java-user@lucene.apache.org Content-Type: multipart/alternative; boundary=047d7bd770ead7f00f04ec2b540a X-Virus-Checked: Checked by ClamAV on apache.org --047d7bd770ead7f00f04ec2b540a Content-Type: text/plain; charset=ISO-8859-1 The goal is to construct the iterator Iterator: term -> [doc1, doc2, ...] It would run through the entire Lucene index . The index contains +100 mln documents Thanks, mb On Wed, Nov 27, 2013 at 5:47 AM, Erick Erickson wrote: > Probably should explain what your end goal here is. > Reconstructing the entire document? Just finding out > what documents a few words belong to? > > The former will be painful and lossy, Luke does that > for instance. > > FWIW, > Erick > > > On Mon, Nov 25, 2013 at 11:54 AM, Michael Berkovsky < > michael.berkovsky@gmail.com> wrote: > > > Hello! > > > > I wonder if there is a fast way to scan through the entire inverted index > > to collect words and documents they belong to. > > > > Thanks, > > mb > > > --047d7bd770ead7f00f04ec2b540a--