Return-Path: X-Original-To: apmail-lucene-solr-user-archive@minotaur.apache.org Delivered-To: apmail-lucene-solr-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 820E8F011 for ; Thu, 28 Mar 2013 22:43:24 +0000 (UTC) Received: (qmail 97388 invoked by uid 500); 28 Mar 2013 22:43:21 -0000 Delivered-To: apmail-lucene-solr-user-archive@lucene.apache.org Received: (qmail 97316 invoked by uid 500); 28 Mar 2013 22:43:21 -0000 Mailing-List: contact solr-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: solr-user@lucene.apache.org Delivered-To: mailing list solr-user@lucene.apache.org Received: (qmail 97308 invoked by uid 99); 28 Mar 2013 22:43:21 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 28 Mar 2013 22:43:21 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [209.85.220.44] (HELO mail-pa0-f44.google.com) (209.85.220.44) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 28 Mar 2013 22:43:14 +0000 Received: by mail-pa0-f44.google.com with SMTP id bi5so105210pad.3 for ; Thu, 28 Mar 2013 15:42:54 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=x-received:date:from:to:subject:in-reply-to:message-id:references :user-agent:mime-version:content-type:x-gm-message-state; bh=YTMHCoowVT8Z3Zy/S0boyC+Z+pjeGIBBN3l92Gol1H4=; b=PeFlAdEafz6xVPg8low40EVNKekl2Dfx5vkz8KrYQ8meWSpX6NoNdhpJ75Wlz5oVlr Uy9ZgBkLBZzh1/5x4Z1gDvlxxbrBfff1Cimgzqn6shdwiO0Pi8qI1MqzmiWRwRkA6x+s ZWzGBNbqn9J6fDS17VJmQczX0SN7dUoyoqJoRQXZTIvxw0taYfP82j+myfKwBtyEyqAI zadi24cdMD/LQMlYWmjVRG2y+NArvHrLLU0hLL4aiWppnuRsJGxAJj1TVWu2BiBqaVp7 XuqR8XyqyZswDjo8J7KK6jM+i4wQz/RaN5qh9FdjJdjnOOL056n9tYCZOJADfvcAwblE Lffw== X-Received: by 10.66.248.99 with SMTP id yl3mr1242887pac.134.1364510574263; Thu, 28 Mar 2013 15:42:54 -0700 (PDT) Received: from frisbee.local (250.185-62-69.ftth.swbr.surewest.net. [69.62.185.250]) by mx.google.com with ESMTPS id ce16sm853349pac.5.2013.03.28.15.42.52 (version=TLSv1.1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Thu, 28 Mar 2013 15:42:53 -0700 (PDT) Date: Thu, 28 Mar 2013 15:42:50 -0700 (PDT) From: Chris Hostetter To: solr-user@lucene.apache.org Subject: Re: Batch Search Query In-Reply-To: Message-ID: References: User-Agent: Alpine 2.02 (DEB 1266 2009-07-14) MIME-Version: 1.0 Content-Type: MULTIPART/MIXED; BOUNDARY="8323329-630290699-1364510572=:5328" X-Gm-Message-State: ALoCoQkWZrNLlzpZSUdXWuMlnvq2FG82FlVefxnh5iUkmOxBlwGdmNIx+O+FCUsDy88ON8+8fKT7 X-Virus-Checked: Checked by ClamAV on apache.org --8323329-630290699-1364510572=:5328 Content-Type: TEXT/PLAIN; charset=windows-1252 Content-Transfer-Encoding: 8BIT : Now, what happens is a user will upload say a word document to us. We then : parse it and process it into segments. It very well could be 5000 segments : or even more in that word document. Each one of those ~5000 segments needs : to be searched for similar segments in solr. I�m not quite sure how I will : do the query (whether proximate or something else). The point though, is to : get back similar results for each segment. You've described your black box (an index of small textual documents) and you've described your input (a large document that will be broken down into N=~5000 small textual snippets) but you haven't really clarified what your desired output should be... * N textual documents from your index, where each doc is the 1 'best' match to 1 of hte N textual input snippets. * Some fixed number Y textual documents from your index representing the "best of the best" matches against your textual input snippets (ie: if one input snippet is a "really good" match for multiple indexed docs, return all of those "really good" matches, but don't return any matches from other snippets if the only matches are "poor".) * Some variable number Y textual documents from your index representing the "best of hte best" matches against your textual input snippets based on some minimum threshhold of matching criteria. * etc... Forgot for a momoent that we are talking about solr at all -- describe some hypothetical data, some hypothetical query examples, and some hypothetical results you would like to get back (or not get back) from each of those query examples (ideally in psuedo-code) and lets see if that doesn't help suggest an implemntation strategy. -Hoss --8323329-630290699-1364510572=:5328--