lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Erick Erickson" <>
Subject Re: combined filesystem and web search
Date Tue, 11 Jul 2006 13:48:27 GMT
I can answer a few of these. If you haven't yet, you'd do yourself a favor
to pick up the book "Lucene in Action". It's written to the 1.4 code-base,
the examples compile but give deprecated warnings for the 1.9 code base, and
need a few more tweaks for the 2.0 code base.

Also, download a copy of Luke. It's invaluable for looking at an index to
answer questions like "why isn't this query returning what I thought it

That said, see below, and understand that I have exactly one 500K document
(growing to 1M over the next few months) project under my belt with Lucene,
so my opinions are those of an relative novice....

On 7/11/06, Tomi NA <> wrote:
> I plan to make lucene (and nutch) a key element in an intranet
> solution, but I only know about lucene what I've read in the last
> couple of days.
> Here's what I'd like opinions about.
> I would like to build a single point of access to data on intranet web
> pages and LAN shared documents.
> I've looked into the file:// vs. http:// issue and it seems that it
> doesn't make sense securitywise to use file:// so I'd make the shares
> accessible through a web server and *then* index them.
> However, if I leverage the web inbuilt lucene search, I'd build two
> "indices" (please pardon the lack of knowledge of lucene terminology).
> Can I search over two indices? Do I have to merge them? How would I do
> the former or the latter?

I'll let folks who understand security answer this one <G>.

Is there a good explanation somewhere how to set up incremental
> indexing, rather than e.g. building the whole index over nightly?

See Luncen in Action (LIA). The short answer is yes. In the simplest form,
you can just add new data to a currently-existing index. You probably want
to make a copy just in case your power goes out or something.

Then there's IndexMergeTool which I haven't used, but looks interesting.

Then there's the possibility of using a MultiSearcher, but I'll leave it to
the experts to talk about how that combines results....

Will lucene produce usable results, knowing that there'll be no rich
> interlinking between a major part of the content (shared docs) to base
> ranking upon? I know lucene uses a number of criteria to rank
> documents upon: I'm asking about real-world experiences and subjective
> grading quality impressions.
> Does anyone have similar experiences with intranet  enterprise data
> searching?
> Cheers,
> Tomi
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message