lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ning Li (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-1035) Optional Buffer Pool to Improve Search Performance
Date Fri, 26 Oct 2007 15:51:51 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-1035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12537995
] 

Ning Li commented on LUCENE-1035:
---------------------------------

> most lucene usecases store much more than just the document id... that would really affect
locality.

In the experiments, I was simulating the (Google) paradigm where you retrieve just the docids
and go to document servers for other things. If store almost always negatively affects locality,
we can make the buffer pool sit only on data/files which we expect good locality (say posting
lists), but not others.

> It seems like a simple LRU cache could really be blown out of the water by certain types
of queries (retrieve a lot of stored fields, or do an expanding term query) that would force
out all previously cached hotspots. Most OS level caching has protection against this (multi-level
LRU or whatever). But of our user-level LRU cache fails, we've also messed up the OS level
cache since we've been hiding page hits from it.

That's a good point. We can improve the algorithm but hopefully still keep it simple and general.
This buffer pool is not a fit-all solution. But hopefully it will benefit a number of use
cases. That's why I say "optional". :)

> I'd like to see single term queries, "OR" queries, and queries across multiple fields
(also a common usecase) that match more documents tested also.

I'll change to "OR" queries and see what happens. The dataset is enwiki with four fields:
docid, date (optional), title and body. Most terms are from title and body.


> Optional Buffer Pool to Improve Search Performance
> --------------------------------------------------
>
>                 Key: LUCENE-1035
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1035
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Store
>            Reporter: Ning Li
>         Attachments: LUCENE-1035.patch
>
>
> Index in RAMDirectory provides better performance over that in FSDirectory.
> But many indexes cannot fit in memory or applications cannot afford to
> spend that much memory on index. On the other hand, because of locality,
> a reasonably sized buffer pool may provide good improvement over FSDirectory.
> This issue aims at providing such an optional buffer pool layer. In cases
> where it fits, i.e. a reasonable hit ratio can be achieved, it should provide
> a good improvement over FSDirectory.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message