Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 428DB200C5E for ; Sat, 22 Apr 2017 21:47:43 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 4111D160BA2; Sat, 22 Apr 2017 19:47:43 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id A8321160B91 for ; Sat, 22 Apr 2017 21:47:41 +0200 (CEST) Received: (qmail 53237 invoked by uid 500); 22 Apr 2017 19:47:35 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 53225 invoked by uid 99); 22 Apr 2017 19:47:35 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 22 Apr 2017 19:47:35 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id ED1641813B8 for ; Sat, 22 Apr 2017 19:47:34 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.481 X-Spam-Level: ** X-Spam-Status: No, score=2.481 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HEADER_FROM_DIFFERENT_DOMAINS=0.001, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, RCVD_IN_SORBS_SPAM=0.5] autolearn=disabled Authentication-Results: spamd3-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=miradortech-com.20150623.gappssmtp.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id dRanwBrb_Hq0 for ; Sat, 22 Apr 2017 19:47:30 +0000 (UTC) Received: from mail-wm0-f52.google.com (mail-wm0-f52.google.com [74.125.82.52]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id 5D1FD5F477 for ; Sat, 22 Apr 2017 19:47:30 +0000 (UTC) Received: by mail-wm0-f52.google.com with SMTP id w64so36776196wma.0 for ; Sat, 22 Apr 2017 12:47:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=miradortech-com.20150623.gappssmtp.com; s=20150623; h=mime-version:in-reply-to:references:from:date:message-id:subject:to; bh=FI6sK4dfOtcIEgARf6xQ53IlWJT0Hk1qnoz9K2CUdr8=; b=fUDrjv9H6uhIVOqsi/zs71cLew37FKMh+zhoD6drh0eay9/0TiCWzOzXz7z+RO+p6p +AYmB8bgBfXR8XIDrKAEgmPv1wrMEBDce6AstcfnSmVBVlqNq4x+lFTk2Jm+qVJEDRbM 9HfabsIThVUYPZFkbnuzEvGgl7pUPGsLZ1hTA8F2QU9+p1AuOb4Dn95VnDp75sIgEX8F auIeHkgHQ/jr9zKuFCyBKqXSyv9begf28uECKPLKiGmuc+3sIuRWX3BsWzZxNjrjgCbt NN9eDI5Yo8/3kdh8+O/nre1cPVH8dj8Gd/+hg7Lsj/W+ODV5Knak0f2DapuOEpuxqi90 3ERw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to; bh=FI6sK4dfOtcIEgARf6xQ53IlWJT0Hk1qnoz9K2CUdr8=; b=YwHyVKMjwow4aj76fTDw1aIF4xMK056v0tcPnfcadAL9+ALMu9BCtced6JHbi8O0IU /qj8IClN0TNnTIjTVr0ig+DHbzJteayVzZ4zL5yPuznY/0Hj72i6PlN1w41l5YB99z8g DFapapKeK/a2dm+8bZYmMDRmwRCCzy9VOiDK83PAXn4w0E8Z2F7UL/JYHlKD5x9rkoSi QOaRcDlcGFeySqAze5/aLFqRsIDS0a15cw129bmr6ad2ue8G0m91JDWAiHDO7Tb9OMbx XZDMh2wgnK4bU+zdYPk8slf7d7rHte0LNOZC4McOs2xzat9UsBiBdv7GXzdoh7/DEGtx YjRA== X-Gm-Message-State: AN3rC/7OOynmVhSau0Xo3Us+iviRgEeuAiEcpEv3HvGBURYhoM3xtLFn hWHLyo6mUC80sOnv4L+PP8F3FNB2NhiN X-Received: by 10.28.93.65 with SMTP id r62mr3638263wmb.70.1492890442894; Sat, 22 Apr 2017 12:47:22 -0700 (PDT) MIME-Version: 1.0 Received: by 10.28.203.66 with HTTP; Sat, 22 Apr 2017 12:47:22 -0700 (PDT) In-Reply-To: References: From: Jacques Uber Date: Sat, 22 Apr 2017 12:47:22 -0700 Message-ID: Subject: Re: How to get document effectively. or FieldCache example To: java-user@lucene.apache.org Content-Type: multipart/alternative; boundary=001a1145b0d489d3b5054dc6a474 archived-at: Sat, 22 Apr 2017 19:47:43 -0000 --001a1145b0d489d3b5054dc6a474 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Have you considered indexing chapters as documents? Using your example you would have three documents corresponding to your three chapters: A, B, and D. Once you have that structure the query "pain AND head" returns only chapters A and B. Using the information gained from this new chapter index you could then use your existing index to do "pain AND head AND (chapter:A OR chapter:B)" On Fri, Apr 21, 2017 at 10:40 PM, neeraj shah wrote: > Hello, > Let me explain my case: > - suppose I am searching word ("pain" (in same chapter) "head") . Thi= s > is my query. > Now what i need to do is i need to first search "pain" and then i need t= o > search "head" seperately then i need common file name of both search > result. > Now the criteria is Suppose: > > FileA - Chapter A - has word only "*pain*" > FileB - Chapter B - has word both "*head*" and "*pain*" > FileC - Chapter A - has word only "*head*" > FileD - Chapter D - has only word "*head*" > FileE - Chapter A - has only word "*pain*" > > Now the result should be: > FileA - Chapter A - has word only "*pain*" > FileB - Chapter B - has word both "*head*" and "*pain*" > FileC - Chapter A - has word only "*head*" > FileE - Chapter A - has only word "*pain*" > > FileD - Chapter D - has only word "*head*" will not appear in search > result because "Chapter D" name is not same as other chapters which has > both search words. > In short I have to show only those chapters from any book but with same > chapter name which has both search word or atleast one search word. But > chapter name should be same. > > Above is my requirement that is why I was parsing all hits for pain and > head seperatly then i was collecting common "title" or chapter name from > both results or the result which has atleast one search word and same > chapter name. > In my result only "pain" word has "5 Lacs result" and "head" word has "60= K" > results. > > Please suggest me if you have other approach in mind. > > Thanks, > Neeraj > > > > > > > On Sat, Apr 22, 2017 at 12:20 AM, Chris Hostetter < > hossman_lucene@fucit.org> > wrote: > > > > > : then which one is right tool for text searching in files. please can > you > > : suggest me? > > > > so far all you've done is show us your *indexing* code; and said that > > after you do a search, calling searcher.doc(docid) on 500,000 documents > is > > slow. > > > > But you still haven't described the usecase you are trying to solve -- > ie: > > *WHY* do you want these 500,000 results from your search? Once you get > > those Documents back, *WHAT* are you going to do with them? > > > > If you show us some code, and talk us through your goal, then we can he= lp > > you -- otherwise all we can do is warn you that the specific > > searcher.doc(docid) API isn't designed to be efficient at that large a > > scale. Other APIs in Lucene are designed to be efficient at large scal= e, > > but we don't really know what to suggest w/o knowing what you're trying > to > > do... > > > > https://people.apache.org/~hossman/#xyproblem > > XY Problem > > > > Your question appears to be an "XY Problem" ... that is: you are dealin= g > > with "X", you are assuming "Y" will help you, and you are asking about > "Y" > > without giving more details about the "X" so that we can understand the > > full issue. Perhaps the best solution doesn't involve "Y" at all? > > See Also: http://www.perlmonks.org/index.pl?node_id=3D542341 > > > > > > PS: please, Please PLEASE upgrade to Lucene 6.x. 3.6 is more then 5 > years > > old, and completley unsupported -- any advice you are given on this lis= t > > is likeley to refer to APIs that are completley different then the > version > > of Lucene you are working with. > > > > > > : > > : > > : On Fri, Apr 21, 2017 at 2:01 PM, Adrien Grand > wrote: > > : > > : > Lucene is not designed for retrieving that many results. What are y= ou > > doing > > : > with those 5 lacs documents, I suspect this is too much to display = so > > you > > : > probably perform some computations on them? If so maybe you could > move > > them > > : > to Lucene using eg. facets? If that does not work, I'm afraid that > > Lucene > > : > is not the right tool for your problem. > > : > > > : > Le ven. 21 avr. 2017 =C3=A0 08:56, neeraj shah a > > : > =C3=A9crit : > > : > > > : > > Yes I fetching around 5 lacs result from index searcher. > > : > > Also i am indexing each line of each file because while searching= i > > need > > : > > all the lines of a file which has matched term. > > : > > Please tell me am i doing it right. > > : > > {code} > > : > > > > : > > InputStream is =3D new BufferedInputStream(new > FileInputStream(file)); > > : > > BufferedReader bufr =3D new BufferedReader(new > > InputStreamReader(is)); > > : > > String inputLine=3D"" ; > > : > > > > : > > while((inputLine=3Dbufr.readLine())!=3Dnull ){ > > : > > Document doc =3D new Document(); > > : > > doc.add(new > > : > > > > : > > Field("contents",inputLine,Field.Store.YES,Field.Index. > > : > ANALYZED,Field.TermVector.WITH_POSITIONS_OFFSETS)); > > : > > doc.add(new > > : > > Field("title",section,Field.Store.YES,Field.Index.NOT_ANALYZED)); > > : > > String newRem =3D new String(rem); > > : > > > > : > > doc.add(new > > : > > Field("fieldsort",newRem,Field.Store.YES,Field.Index.ANALYZED)); > > : > > doc.add(new Field("fieldsort2",rem. > toLowerCase().replaceAll("-", > > : > > "").replaceAll(" ", ""),Field.Store.YES,Field.Index.ANALYZED)); > > : > > > > : > > doc.add(new > > : > > Field("field1",Author,Field.Store.YES,Field.Index.NOT_ANALYZED)); > > : > > doc.add(new > > : > > Field("field2",Book,Field.Store.YES,Field.Index.NOT_ANALYZED)); > > : > > doc.add(new > > : > > Field("field3",sec,Field.Store.YES,Field.Index.NOT_ANALYZED)); > > : > > > > : > > writer.addDocument(doc); > > : > > > > : > > } > > : > > is.close(); > > : > > > > : > > {/code} > > : > > > > : > > On Thu, Apr 20, 2017 at 5:57 PM, Adrien Grand > > wrote: > > : > > > > : > > > IndexSearcher.doc is the right way to retrieve documents. If th= is > > is > > : > > > slowing things down for you, I'm wondering that you might be > > fetching > > : > too > > : > > > many results? > > : > > > > > : > > > Le jeu. 20 avr. 2017 =C3=A0 14:16, neeraj shah < > neerajshah84@gmail.com> > > a > > : > > > =C3=A9crit : > > : > > > > > : > > > > Hello Everyone, > > : > > > > > > : > > > > I am using Lucene 3.6. I have to index around 60k docuemnts. > > After > > : > > > > performing the search when i try to reterive documents from > > seacher > > : > > using > > : > > > > searcher.doc(docid) it slows down the search . > > : > > > > Please is there any other way to get the document. > > : > > > > > > : > > > > Also if anyone can give me an end-to-end example for working > > : > > FieldCache. > > : > > > > While implementing the cache i have : > > : > > > > > > : > > > > int[] fieldIds =3D FieldCache.DEFAULT.getInts(indexMultiReade= r, > > "id"); > > : > > > > > > : > > > > now i dont know how to further use the fieldIds for improving > > search. > > : > > > > Please give me an end-to-end example. > > : > > > > > > : > > > > Thanks > > : > > > > Neeraj > > : > > > > > > : > > > > > : > > > > : > > > : > > > > -Hoss > > http://www.lucidworks.com/ > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > > For additional commands, e-mail: java-user-help@lucene.apache.org > > > --001a1145b0d489d3b5054dc6a474--