lucene-lucene-net-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Anders Lybecker <...@miracleas.dk>
Subject Re: Lucene .Net for real-time full-text search
Date Fri, 11 Jun 2010 15:24:42 GMT
Hi Lidia,

First of all, you need to parse and index the files before you're able to
search. It's fast - no worries.

Depending on the size of the documents indexing normally takes less than a
second and thereafter you are able to search with the NRT (Near Real-Time)
capabilities.

You could consider keeping the Lucene index in RAM – which will speed up the
indexing and search process considerably. (you might still consider storing
it on disk)

I have built a handful of solutions with Lucene.Net – one of them has
currently indexed more than 800.000 documents (~45 million pages) and
searching still takes less than a second :-)

I recommend the Lucene in Action book from Manning (
http://www.manning.com/hatcher2/) – the code samples are written in Java,
but everything applies equally to .Net.

If it has to be a Microsoft product, you could use SQL Server FullText or
Fast. I have used and still uses SQL Server FullText search, but it lacks
functionality, but is still fast (the 2008 version is ~20% faster than
2005). Alternatively you could use Fast, but it is expensive.

:-)
Anders Lybecker
+45 25 277 147

On Fri, Jun 11, 2010 at 4:02 PM, Lidia Rozhentsova <
Lidia.Rozhentsova@direkt.se> wrote:

>  Hi!
>
>
>
>
>
> My name is Lidia. Currently I’m looking for a search engine to develop an
> application for Swedish financial news maker Direkt.se.
>
>
>
> My goal is to find a search engine that allows a real-time full-text
> search. Briefly, a business process that requires such a solution is:
>
>    1. Different companies announce that they will publish particular
>    financial information at particular date and time. This information usually
>    consists of company name, financial period, financial indicator (sales,
>    gross margin, operating income)
>    2. At that date and time we receive html file with financial report (I
>    attached an example of such a file)
>    3. In the received file we have to find information that was described
>    at the first step. For example, what Sales the company had in the first
>    quarter of 2010
>
>
>
> We can have up to 100-200 files at one time and we have to find information
> that we’re interested in ASAP since time is extremely critical for the news
> maker company. So, we don’t have time for indexing files.
>
>
>
> I’ve read that Lucene starting from 2.9 version supports near real-time
> search but I’m not sure how fast it will work with the task I’ve described.
> Also, my company is interested in Microsoft technologies, that’s why I’m
> writing to .Net community.
>
>
>
> Could you, please, clarify for me if Lucene is capable to support the task
> I described or give me a link where I can read about it?
>
>
>
>
>
> Thank you very much for assistance!
>
>
>
> Best regards,
>
>
>
> *Lidia Rozhentsova*
>
>
>
>
>
> <http://www.direkt.se/>
>
> Utvecklare
>
>
>
> Nyhetsbyrån Direkt
>
>
>
> Norrlandsgatan 15
>
>
>
> 111 43 Stockholm
>
>
>
>
>
>
>
> Phone
>
> +46 (0)8 519 179 00
>
>
>
>
>
> Direct
>
> +46 (0)8 519 179 05
>
>
>
> www.direkt.se <http://%C2%A0%C2%A0www.direkt.se>
>
> Mobile
>
> +46 (0)76 062 50 45
>
>
>
> lidia.rozhentsova@direkt.se <nlidia.rozhentsova@direkt.se>
>
>
>
> This e-mail and the information it contains may be privileged and/or
> confidential. It is for the intended addressee(s) only. The unauthorised
> use, disclosure or copying of this e-mail, or any information it contains,
> is prohibited. If you are not an intended recipient, please contact the
> sender and delete the material from your computer.
>
>
>
>
>

Mime
  • Unnamed multipart/related (inline, None, 0 bytes)
View raw message