Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 28024 invoked from network); 17 Jul 2009 00:03:38 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 17 Jul 2009 00:03:38 -0000 Received: (qmail 20762 invoked by uid 500); 17 Jul 2009 00:04:41 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 20696 invoked by uid 500); 17 Jul 2009 00:04:41 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 20686 invoked by uid 99); 17 Jul 2009 00:04:41 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 17 Jul 2009 00:04:41 +0000 X-ASF-Spam-Status: No, hits=2.2 required=10.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of erickerickson@gmail.com designates 209.85.217.205 as permitted sender) Received: from [209.85.217.205] (HELO mail-gx0-f205.google.com) (209.85.217.205) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 17 Jul 2009 00:04:31 +0000 Received: by gxk1 with SMTP id 1so795133gxk.5 for ; Thu, 16 Jul 2009 17:04:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:content-type; bh=L3PbiVyABsie9gunthSu2QEvJf3AVTnkx/q/gVFEFHU=; b=izERLFNF015Un9Y17wyxzGu5Be3wSlxXjaafvkfMgrWpgCv0JOHC3wjA033KlOBtmA e06iJmFV62hyCiGmaMyvxLtyxMV25PHT+L5aO7hTfBrJZsgTHVhslUJvEt3PFZU8Gnqh eavRlxQRIElnJCbw0FqgnSZQ4SvS90S1xaZwg= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; b=cfFJv4NPSHtLDC5JH1t5b7wSe102F33TrpkfdpNZM8oJPnGoDM8pvze9K+4EzcHyqX /0QPf6dbhLDXj+DhYEctLPOgc96c+qGo30bNcEOXpbqAskmNfCUm/G8xhobVyWmRZVvg WPneg8jIVR/q8Xf5fWoaXdZKkgtvOQLpfoF+M= MIME-Version: 1.0 Received: by 10.231.19.7 with SMTP id y7mr148483iba.9.1247789050448; Thu, 16 Jul 2009 17:04:10 -0700 (PDT) In-Reply-To: References: <867513fe0907160537k6e842921gfe67e7df9645599@mail.gmail.com> Date: Thu, 16 Jul 2009 20:04:10 -0400 Message-ID: <359a92830907161704ga257498j320714c1af9c8e3a@mail.gmail.com> Subject: Re: Unable to do exact search with Lucene. From: Erick Erickson To: java-user@lucene.apache.org Content-Type: multipart/alternative; boundary=00221532cf741b4cec046edb87b5 X-Virus-Checked: Checked by ClamAV on apache.org --00221532cf741b4cec046edb87b5 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit The first thing I'd do is get a copy of Luke and look in my index to see exactly what's there. Nothing in your e-mails indicates that you *should* get any hits. Although I admin not getting jakarta lucene in 50M pages seems unlikely. But Ian's suggestion that you start with a smaller index is spot on. Best Erick On Thu, Jul 16, 2009 at 8:42 AM, prashant ullegaddi < prashullegaddi@gmail.com> wrote: > 50 million HTML pages (part of clueweb09 dataset for TREC) were indexed > using Hadoop into 56 indexes. 56 indexes were merged into a single index. > Analyzer is the StandardAnalyzer. > > > > On Thu, Jul 16, 2009 at 6:07 PM, Anshum wrote: > > > Hi Prashant, > > > > What did you index? how did you index? what analyzer did you use? without > > all of these, perhaps it'd be difficult to figure out the issue. > > > > -- > > Anshum Gupta > > Naukri Labs! > > http://ai-cafe.blogspot.com > > > > The facts expressed here belong to everybody, the opinions to me. The > > distinction is yours to draw............ > > > > > > On Thu, Jul 16, 2009 at 6:04 PM, prashant ullegaddi < > > prashullegaddi@gmail.com> wrote: > > > > > Hi, > > > > > > I tried searching: > > > "Apache Jakarta"~10 > > > > > > Nothing was returned. What might be wrong? > > > > > > Regards, > > > Prashant. > > > > > > --00221532cf741b4cec046edb87b5--