lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ning Li (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-1035) Optional Buffer Pool to Improve Search Performance
Date Fri, 26 Oct 2007 21:54:51 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-1035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12538112
] 

Ning Li commented on LUCENE-1035:
---------------------------------

> I'll change to "OR" queries and see what happens.

  Query set with average 590K results, retrieving docids for the first 5K
  Buffer Pool Size    Hit Ratio    Queries per second
     0                 N/A             1.9
     16M               53%             1.9
     32M               68%             2.0
     64M               90%             2.3
     128M/256M/512M              99%             2.3

As Yonik pointed out, in the previous "AND" tests, the bottleneck is the system call to move
data from file system cache to userspace. Here in the "OR" tests, much fewer such calls are
made therefore the speedup is less significant. Wish I could get a real query workload for
this dataset.

> Actually, phrase queries would be really interesting too since they hit the term positions.

Phrase queries are rare and term distribution is highly skewed according to the following
study on the Excite query log:
Spink, Amanda and Xu, Jack L. (2000)   "Selected results from a large study of Web searching:
the Excite study".  Information Research, 6(1) Available at: http://InformationR.net/ir/6-1/paper90.html

"4. Phase Searching: Phrases (terms enclosed by quotation marks) were seldom, while only 1
in 16 queries contained a phrase - but correctly used.
5. Search Terms: Distribution: Jansen, et al., (2000) report the distribution of the frequency
of use of terms in queries as highly skewed."

I didn't find a good on on the AOL query log. In any case, this buffer pool is not intended
for general purpose. I mentioned RAMDirectory earlier. This is more like an alternative to
RAMDirectory (that's why it's per directory): you want persistent storage for the index, yet
it's not too big that you want RAMDirectory search performance. In addition, the entire index
doesn't have to fit into memory, as long as the most queried part does. Hopefully, this benefits
a subset of Lucene use cases.

> did you compare it against MMAP? I

The index I experimented on didn't fit in memory...


> Optional Buffer Pool to Improve Search Performance
> --------------------------------------------------
>
>                 Key: LUCENE-1035
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1035
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Store
>            Reporter: Ning Li
>         Attachments: LUCENE-1035.patch
>
>
> Index in RAMDirectory provides better performance over that in FSDirectory.
> But many indexes cannot fit in memory or applications cannot afford to
> spend that much memory on index. On the other hand, because of locality,
> a reasonably sized buffer pool may provide good improvement over FSDirectory.
> This issue aims at providing such an optional buffer pool layer. In cases
> where it fits, i.e. a reasonable hit ratio can be achieved, it should provide
> a good improvement over FSDirectory.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message