Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm
Precedence: bulk
Reply-To: java-user@lucene.apache.org
MIME-Version: 1.0
In-Reply-To: <CACC=26OMgagKoH1=O6Nsu_kj8O9V1bBGXQk6GhmFwEso6oC+Kg@mail.gmail.com>
References: <CACC=26MEpoK=wWC9nGc=u-OKNwGE36DYr84piGrtEgbi4kCWMg@mail.gmail.com>
 <CAPsWd+PhLPhcR-j85v4NrfW3sW-H=hOd6ZZO+Cm4J2y7o-rgww@mail.gmail.com>
 <CACC=26N1Zo5Svd0c+b9wHK-qGvVe5DUL315MSbEEaDodTy+hYA@mail.gmail.com>
 <CAPsWd+Mh43Er0AQbBp5fea09DebRGA4JDhNE2p+SoniYFj2P+g@mail.gmail.com>
 <CACC=26N7BPjYVibpdUnQPGr9f9AjGtYbAs1PRNsM9HcS9kKAmg@mail.gmail.com>
 <alpine.DEB.2.11.1704211142470.15059@tray> <CACC=26OMgagKoH1=O6Nsu_kj8O9V1bBGXQk6GhmFwEso6oC+Kg@mail.gmail.com>
From: Jacques Uber <uberj@miradortech.com>
Date: Sat, 22 Apr 2017 12:47:22 -0700
Message-ID: <CAH+fdBcNYcjBr8jUkjf4_2ysTsouTkRdPGFsB6aabfkDT0WDuw@mail.gmail.com>
Subject: Re: How to get document effectively. or FieldCache example
To: java-user@lucene.apache.org
Content-Type: multipart/alternative; boundary=001a1145b0d489d3b5054dc6a474
archived-at: Sat, 22 Apr 2017 19:47:43 -0000

--001a1145b0d489d3b5054dc6a474
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

Have you considered indexing chapters as documents? Using your example you
would have three documents corresponding to your three chapters: A, B, and
D. Once you have that structure the query "pain AND head" returns only
chapters A and B. Using the information gained from this new chapter index
you could then use your existing index to do "pain AND head AND (chapter:A
OR chapter:B)"

On Fri, Apr 21, 2017 at 10:40 PM, neeraj shah <neerajshah84@gmail.com>
wrote:

> Hello,
> Let me explain my case:
> - suppose I am  searching word ("pain" (in same chapter) "head") .    Thi=
s
> is my query.
>  Now what i need to do is i need to first search "pain" and then i need t=
o
> search "head" seperately then i need common file name of both search
> result.
> Now the criteria is Suppose:
>
> FileA - Chapter A  - has word only "*pain*"
> FileB - Chapter B  - has word both "*head*" and "*pain*"
> FileC - Chapter A  - has word only "*head*"
> FileD - Chapter D  - has only word "*head*"
> FileE -  Chapter A - has only word "*pain*"
>
> Now the result should be:
> FileA - Chapter A  - has word only "*pain*"
> FileB - Chapter B  - has word both "*head*" and "*pain*"
> FileC - Chapter A  - has word only "*head*"
> FileE -  Chapter A - has only word "*pain*"
>
> FileD - Chapter D  - has only word "*head*" will not appear in search
> result because "Chapter D" name is not same as other chapters which has
> both search words.
> In short I have to show only those chapters from any book but with same
> chapter name which has both search word or atleast one search word. But
> chapter name should be same.
>
> Above is my requirement that is why I was parsing all hits for pain and
> head seperatly then i was collecting common "title" or chapter name from
> both results or the result which has atleast one search word and same
> chapter name.
> In my result only "pain" word has "5 Lacs result" and "head" word has "60=
K"
> results.
>
> Please suggest me if you have other approach in mind.
>
> Thanks,
> Neeraj
>
>
>
>
>
>
> On Sat, Apr 22, 2017 at 12:20 AM, Chris Hostetter <
> hossman_lucene@fucit.org>
> wrote:
>
> >
> > : then which one is right tool for text searching in files. please can
> you
> > : suggest me?
> >
> > so far all you've done is show us your *indexing* code; and said that
> > after you do a search, calling searcher.doc(docid) on 500,000 documents
> is
> > slow.
> >
> > But you still haven't described the usecase you are trying to solve --
> ie:
> > *WHY* do you want these 500,000 results from your search? Once you get
> > those Documents back, *WHAT* are you going to do with them?
> >
> > If you show us some code, and talk us through your goal, then we can he=
lp
> > you -- otherwise all we can do is warn you that the specific
> > searcher.doc(docid) API isn't designed to be efficient at that large a
> > scale.  Other APIs in Lucene are designed to be efficient at large scal=
e,
> > but we don't really know what to suggest w/o knowing what you're trying
> to
> > do...
> >
> > https://people.apache.org/~hossman/#xyproblem
> > XY Problem
> >
> > Your question appears to be an "XY Problem" ... that is: you are dealin=
g
> > with "X", you are assuming "Y" will help you, and you are asking about
> "Y"
> > without giving more details about the "X" so that we can understand the
> > full issue.  Perhaps the best solution doesn't involve "Y" at all?
> > See Also: http://www.perlmonks.org/index.pl?node_id=3D542341
> >
> >
> > PS: please, Please PLEASE upgrade to Lucene 6.x.  3.6 is more then 5
> years
> > old, and completley unsupported -- any advice you are given on this lis=
t
> > is likeley to refer to APIs that are completley different then the
> version
> > of Lucene you are working with.
> >
> >
> > :
> > :
> > : On Fri, Apr 21, 2017 at 2:01 PM, Adrien Grand <jpountz@gmail.com>
> wrote:
> > :
> > : > Lucene is not designed for retrieving that many results. What are y=
ou
> > doing
> > : > with those 5 lacs documents, I suspect this is too much to display =
so
> > you
> > : > probably perform some computations on them? If so maybe you could
> move
> > them
> > : > to Lucene using eg. facets? If that does not work, I'm afraid that
> > Lucene
> > : > is not the right tool for your problem.
> > : >
> > : > Le ven. 21 avr. 2017 =C3=A0 08:56, neeraj shah <neerajshah84@gmail.=
com> a
> > : > =C3=A9crit :
> > : >
> > : > > Yes I fetching around 5 lacs result from index searcher.
> > : > > Also i am indexing each line of each file because while searching=
 i
> > need
> > : > > all the lines of a file which has matched term.
> > : > > Please tell me am i doing it right.
> > : > > {code}
> > : > >
> > : > > InputStream  is =3D new BufferedInputStream(new
> FileInputStream(file));
> > : > >     BufferedReader bufr =3D new BufferedReader(new
> > InputStreamReader(is));
> > : > >     String inputLine=3D"" ;
> > : > >
> > : > >     while((inputLine=3Dbufr.readLine())!=3Dnull ){
> > : > > Document doc =3D new Document();
> > : > >     doc.add(new
> > : > >
> > : > > Field("contents",inputLine,Field.Store.YES,Field.Index.
> > : > ANALYZED,Field.TermVector.WITH_POSITIONS_OFFSETS));
> > : > >     doc.add(new
> > : > > Field("title",section,Field.Store.YES,Field.Index.NOT_ANALYZED));
> > : > >     String newRem =3D new String(rem);
> > : > >
> > : > >     doc.add(new
> > : > > Field("fieldsort",newRem,Field.Store.YES,Field.Index.ANALYZED));
> > : > >     doc.add(new Field("fieldsort2",rem.
> toLowerCase().replaceAll("-",
> > : > > "").replaceAll(" ", ""),Field.Store.YES,Field.Index.ANALYZED));
> > : > >
> > : > >     doc.add(new
> > : > > Field("field1",Author,Field.Store.YES,Field.Index.NOT_ANALYZED));
> > : > >     doc.add(new
> > : > > Field("field2",Book,Field.Store.YES,Field.Index.NOT_ANALYZED));
> > : > >     doc.add(new
> > : > > Field("field3",sec,Field.Store.YES,Field.Index.NOT_ANALYZED));
> > : > >
> > : > >     writer.addDocument(doc);
> > : > >
> > : > > }
> > : > >     is.close();
> > : > >
> > : > > {/code}
> > : > >
> > : > > On Thu, Apr 20, 2017 at 5:57 PM, Adrien Grand <jpountz@gmail.com>
> > wrote:
> > : > >
> > : > > > IndexSearcher.doc is the right way to retrieve documents. If th=
is
> > is
> > : > > > slowing things down for you, I'm wondering that you might be
> > fetching
> > : > too
> > : > > > many results?
> > : > > >
> > : > > > Le jeu. 20 avr. 2017 =C3=A0 14:16, neeraj shah <
> neerajshah84@gmail.com>
> > a
> > : > > > =C3=A9crit :
> > : > > >
> > : > > > > Hello Everyone,
> > : > > > >
> > : > > > > I am using Lucene 3.6. I have to index around 60k docuemnts.
> > After
> > : > > > > performing the search when i try to reterive documents from
> > seacher
> > : > > using
> > : > > > > searcher.doc(docid)  it slows down the search .
> > : > > > > Please is there any other way to get the document.
> > : > > > >
> > : > > > > Also if anyone can give me an end-to-end example for working
> > : > > FieldCache.
> > : > > > > While implementing the cache i have :
> > : > > > >
> > : > > > > int[] fieldIds =3D FieldCache.DEFAULT.getInts(indexMultiReade=
r,
> > "id");
> > : > > > >
> > : > > > > now i dont know how to further use the fieldIds for improving
> > search.
> > : > > > > Please give me an end-to-end example.
> > : > > > >
> > : > > > > Thanks
> > : > > > > Neeraj
> > : > > > >
> > : > > >
> > : > >
> > : >
> > :
> >
> > -Hoss
> > http://www.lucidworks.com/
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
>

--001a1145b0d489d3b5054dc6a474--