Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 5AE66200C59 for ; Mon, 3 Apr 2017 03:18:06 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 5914D160B9A; Mon, 3 Apr 2017 01:18:06 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id A04B2160B8E for ; Mon, 3 Apr 2017 03:18:05 +0200 (CEST) Received: (qmail 75926 invoked by uid 500); 3 Apr 2017 01:17:58 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 75914 invoked by uid 99); 3 Apr 2017 01:17:58 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 03 Apr 2017 01:17:58 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id D6DD11809E2 for ; Mon, 3 Apr 2017 01:17:57 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -0.121 X-Spam-Level: X-Spam-Status: No, score=-0.121 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd3-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=trypticon.org Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id QbJYrrF3yf60 for ; Mon, 3 Apr 2017 01:17:56 +0000 (UTC) Received: from mail-vk0-f42.google.com (mail-vk0-f42.google.com [209.85.213.42]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id 0211D5F30A for ; Mon, 3 Apr 2017 01:17:56 +0000 (UTC) Received: by mail-vk0-f42.google.com with SMTP id r69so122570224vke.2 for ; Sun, 02 Apr 2017 18:17:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=trypticon.org; s=google; h=mime-version:from:date:message-id:subject:to; bh=+YymCt4o3yJ7b1NG1csC4Ff0ljcnYBaCWnvFq3qM/w8=; b=hA0n4K7jMla7eOy6QY237AixcZJz6hBwhLMzipvyHbbZ3v9jsGDXEau/q22z20GUp+ bOPEmD8BuIad2cr6KHMiFgaLq0aOZaBWGbhhviuzsdDE5a2xW6ZkIl+bkZnIj2pFTo/6 x2m8vTuE2zQIWRvH9Hkn7w9WMWHamFLRl5iln+G143v87C4fbeohDr3mASDEdJ7P2PsB it0wAcryawpjV16eYdAKmTVoow/zct0X4bVRw3FMr12pF1qKUdtCc8cU06H1u4Ttx09T CrJAB/FdqooQrwA9rGORKB3iU7zKBItvcpQXUpNF1c5oxdOJkewUP6uUADkbqppsPOfm 8AQw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:from:date:message-id:subject:to; bh=+YymCt4o3yJ7b1NG1csC4Ff0ljcnYBaCWnvFq3qM/w8=; b=faa/2zSAcSeLWChxypPR3WFY9r87NXPyH1nqo4YppK8NE+sjz/qi3NA1VfYmOje8LV jIrV5PiyB5UKU2CFNdnN4gpKrklv5Xl/Vr/YwK5GNwMVT/iSQcxTA6ByYEiGE0Gk7Cct 60xaMioxyuxSqO6p0kpz4QF6zjcRmOQwanUPnzYBCIaqrKOlAJ13F7zSCHs6nAbRQROD WXDMvgy4Pxbc/r+Dq9XVt+XdmRICJNqilIxKjrThjX2dI8Y6SMQyJfDP9baGQ8NwAd7E nDIi2Vc0el8kJI4wsrsNuKiMf1i8m+wMcST+oUb/FDzSPPZhlHAVtz3J7mZh7hzqNnWL wF+g== X-Gm-Message-State: AFeK/H1Zj3Ak7aUtqqGABRkahV2bE7b/BhifVgKuD0CGfzEW3UzbLD9yfJwmioob07Xjdw== X-Received: by 10.31.115.193 with SMTP id o184mr5723909vkc.113.1491182268914; Sun, 02 Apr 2017 18:17:48 -0700 (PDT) Received: from mail-vk0-f48.google.com (mail-vk0-f48.google.com. [209.85.213.48]) by smtp.gmail.com with ESMTPSA id g188sm497511vkb.25.2017.04.02.18.17.48 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sun, 02 Apr 2017 18:17:48 -0700 (PDT) Received: by mail-vk0-f48.google.com with SMTP id s68so122514692vke.3 for ; Sun, 02 Apr 2017 18:17:48 -0700 (PDT) X-Received: by 10.159.40.7 with SMTP id c7mr6202110uac.91.1491182267894; Sun, 02 Apr 2017 18:17:47 -0700 (PDT) MIME-Version: 1.0 Received: by 10.176.76.18 with HTTP; Sun, 2 Apr 2017 18:17:47 -0700 (PDT) From: Trejkaz Date: Mon, 3 Apr 2017 11:17:47 +1000 X-Gmail-Original-Message-ID: Message-ID: Subject: Is there some sensible way to do giant BooleanQuery or similar lazily? To: Lucene Users Mailing List Content-Type: text/plain; charset=UTF-8 archived-at: Mon, 03 Apr 2017 01:18:06 -0000 Hi all. We have this one kind of query where you essentially specify a text file which contains the actual query to search for. The catch is that the text file can be large. Our custom query currently computes the set of matching docs up-front, and then when queries come in for one LeafReader, the larger doc ID set is sliced so that the sub-slice for that leaf is returned. Which is confusing, and seems backwards. As an alternative, we could override rewrite(IndexReader) and return a gigantic boolean query. Problems being: 1) A gigantic BooleanQuery takes up a lot more memory than a list of query strings. 2) Lucene devs often say that gigantic boolean queries are bad, maybe for reason #1, or maybe for another reason which nobody understands So in place of this, is there some kind of alternative? For instance, is there some query type where I can provide an iterator of sub-queries, so that they don't all have to be in memory at once? The code to get each sub-query is always relatively straight-forward and easy to understand. I guess the snag is that sometimes the line of text is natural language which gets run through an analyser, so we'd potentially be re-analysing the text once per leaf reader? :/ This would replace about 1/3 of the remaining places where we have to compute the doc ID set up-front. TX --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org