Hi Lidia,

First of all, you need to parse and index the files before you're able to search. It's fast - no worries.

Depending on the size of the documents indexing normally takes less than a second and thereafter you are able to search with the NRT (Near Real-Time) capabilities.

You could consider keeping the Lucene index in RAM Ė which will speed up the indexing and search process considerably. (you might still consider storing it on disk)

I have built a handful of solutions with Lucene.Net Ė one of them has currently indexed more than 800.000 documents (~45 million pages) and searching still takes less than a second :-)

I recommend the Lucene in Action book from Manning (http://www.manning.com/hatcher2/) Ė the code samples are written in Java, but everything applies equally to .Net.

If it has to be a Microsoft product, you could use SQL Server FullText or Fast. I have used and still uses SQL Server FullText search, but it lacks functionality, but is still fast (the 2008 version is ~20% faster than 2005). Alternatively you could use Fast, but it is expensive.

:-)
Anders Lybecker
+45 25 277 147

On Fri, Jun 11, 2010 at 4:02 PM, Lidia Rozhentsova <Lidia.Rozhentsova@direkt.se> wrote:

Hi!

My name is Lidia. Currently Iím looking for a search engine to develop an application for Swedish financial news maker Direkt.se.

My goal is to find a search engine that allows a real-time full-text search. Briefly, a business process that requires such a solution is:

  1. Different companies announce that they will publish particular financial information at particular date and time. This information usually consists of company name, financial period, financial indicator (sales, gross margin, operating income)
  2. At that date and time we receive html file with financial report (I attached an example of such a file)
  3. In the received file we have to find information that was described at the first step. For example, what Sales the company had in the first quarter of 2010

We can have up to 100-200 files at one time and we have to find information that weíre interested in ASAP since time is extremely critical for the news maker company. So, we donít have time for indexing files.

Iíve read that Lucene starting from 2.9 version supports near real-time search but Iím not sure how fast it will work with the task Iíve described. Also, my company is interested in Microsoft technologies, thatís why Iím writing to .Net community.

Could you, please, clarify for me if Lucene is capable to support the task I described or give me a link where I can read about it?

Thank you very much for assistance!

Best regards,

Lidia Rozhentsova

Utvecklare

NyhetsbyrŚn Direkt

Norrlandsgatan 15

111 43 Stockholm

Phone

+46 (0)8†519 179 00

Direct

+46 (0)8†519 179 05

www.direkt.se

Mobile

+46 (0)76†062†50 45

lidia.rozhentsova@direkt.se

This e-mail and the information it contains may be privileged and/or confidential. It is for the intended addressee(s) only. The unauthorised use, disclosure or copying of this e-mail, or any information it contains, is prohibited. If you are not an intended recipient, please contact the sender and delete the material from your computer.