lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adrien Grand <jpou...@gmail.com>
Subject Re: Searcher Performance
Date Fri, 17 Feb 2017 16:37:24 GMT
Some minimal information about the fields is loaded into memory when you
open the index reader. Things like the list of fields and how they are
indexed.

However the vast majority of the data is read from disk lazily, we do not
warm the filesystem cache or anything like that by default. We do not use
direct I/O either. So say you run a term query, only pages that contain
information about these particular field and value will be loaded into the
cache.

In case you want to warm the filesystem cache explicitly, which could be a
good idea if you have plenty of filesystem cache for your index (ie. the
unused memory of the system is larger than the index), you can look into
using MMapDirectory.setPreload.

Le ven. 17 févr. 2017 à 15:13, Chitra R <chithu.r111@gmail.com> a écrit :

> Hey, thank you so much. I got it.
>
> I have
>
>    - 10 lakh docs, 30 fields in my index
>    - opening new searcher at initial search and
>    - there will be no filesystem cache for my current index
>
> At initial search, I search across only one field out of 30 fields in my
> index.
>
> My question is,
>
> *At initial search, Whether the required page (os pages of Lucene index
> files) for that field (a single field) will be loaded to filesystem cache
> or all the fields info will be loaded to filesystem cache from disk?*
>
>
> Regards,
> Chitra
>
> On Fri, Feb 17, 2017 at 7:05 PM, Adrien Grand <jpountz@gmail.com> wrote:
>
> > Regarding whether the filesystem cache helps, you could look at whether
> > there is some disk activity while your queries are running.
> >
> > When everything is in the filesystem cache, the latency of search
> requests
> > for simple queries (term queries and combinations through boolean
> queries)
> > usually mostly depends on the total number of matches since Lucene needs
> to
> > call the collector on every match.
> >
> > Le ven. 17 févr. 2017 à 10:09, Chitra R <chithu.r111@gmail.com> a écrit
> :
> >
> > > Hi,
> > >      While working with Searcher.Search, I have noticed a difference in
> > > their performance. I have 10 lakh documents and 30 fields in my index.
> I
> > > have performed three searches using different queries in a sequential
> > > manner. At search time, I used MMapDirectory and index is opened.
> > >
> > > *case1: *
> > >
> > >    - During the first search, I ran the Query Say (new TermQuery(new
> > >    Term("name","Chitra"))) and which yields 1 lakh documents as result.
> > > Time
> > >    taken for first search = 50 - 60 ms nearly.
> > >    - And for the second search, I ran the Query Say (new TermQuery(new
> > >    Term("animal","lion"))) which also yields 1 lakh documents as
> result.
> > > Time
> > >    taken for Second search = 50 - 60 ms nearly.
> > >    - And for the third search,  I ran the Query Say (new TermQuery(new
> > >    Term("bird","peacock"))) which also yields 1 lakh documents as
> result.
> > >    Time taken for Third search = 50 - 60 ms nearly.
> > >
> > > In this case, why does searcher.search take the same search time for
> > > different queries?
> > >
> > > *case2:*
> > >
> > > Suppose if I ran the same query twice, Searcher.search took less time
> > than
> > > the previous search because of os cache.
> > >
> > > *Based on above observation, *
> > >
> > > During initial search, only the required portion of index files will be
> > > loaded to i/o cache. And for the next search, if the required portion
> is
> > > not present in os cache,
> > >
> > > Will it take time to read that files from disk? If so, this is the
> reason
> > > behind searcher.search is taking the nearly same search time for
> > different
> > > queries.
> > >
> > >
> > > Regards,
> > > Chitra
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message