lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hauck, William B." <>
Subject RE: google mini? who needs it when Lucene is there
Date Thu, 03 Feb 2005 21:08:13 GMT

I disagree that the Google Mini is useless.  $5000 is quite inexpensive
for a commercial search engine.  I know of search engines where the cost
is practically 20 cents per document.  Heck, a decent server capable of
running a heavily loaded search engine costs $3000.  Also, don't forget
you get two years of hardware replacements in case of failure and
software updates for that $5000.

Lucene is a great concept, a great effort, and a great indexer.  But it
doesn't have a built-in spider; you need to provide that or use add-on
code or a system like Nutch which, according to, only parses HTML.  You
can't process PDFs, MS Office-type docs, and others without still more
code.  At this point, for a small company, probably with an over-worked
IT staff, Lucene would be just too time consuming to use.

The other thing to consider is the use of the search engine.  Is it for
a company's public website?  Is it to index every document that the
company produces?  Will it index databases?  Email?  Lotus Notes apps?
For a simple website search engine, Google Mini and other appliances are
hard to beat.  The 50,000 document limit:  50,000 documents is a lot for
a small company website.



-----Original Message-----
From: jian chen [] 
Sent: Thursday, January 27, 2005 11:06 PM
To: Lucene Users List
Subject: Re: google mini? who needs it when Lucene is there

Overall, even if google mini gives a lot of cool features compared to a
bare-born lucene project, what is good with the 50,000 documents limit.
It is useless with that limit. That is just their way of trying to turn
it into another cash cow.


On Thu, 27 Jan 2005 17:45:03 -0800 (PST), Otis Gospodnetic
<> wrote:
> 500 times the original data?  Not true! :)
> Otis
> --- "Xiaohong Yang (Sharon)" <> wrote:
> > Hi,
> >
> > I agree that Google mini is quite expensive.  It might be similar to

> > the desktop version in quality.  Anyone knows google's ratio of
> > to text?   Is it true that Lucene's index is about 500 times the
> > original text size (not including image size)?  I don't have one 
> > installed, so I cannot measure.
> >
> > Best,
> >
> > Sharon
> >
> > jian chen <> wrote:
> > Hi,
> >
> > I was searching using google and just found that there was a new 
> > feature called "google mini". Initially I thought it was another 
> > free service for small companies. Then I realized that it costs 
> > quite some money ($4,995) for the hardware and software. (I guess 
> > the proprietary software costs a whole lot more than actual 
> > hardware.)
> >
> > The "nice" feature is that, you can only index up to 50,000 
> > documents with this price. If you need to index more, sorry, send in

> > the check...
> >
> > It seems to me that any small biz will be ripped off if they install

> > this google mini thing, compared to using Lucene to implement a easy

> > to use search software, which could search up to whatever number of 
> > documents you could image.
> >
> > I hope the lucene project could get exposed more to the enterprise 
> > so that people know that they have not only cheaper but more 
> > importantly, BETTER alternatives.
> >
> > Jian
> >
> > --------------------------------------------------------------------
> > -
> > To unsubscribe, e-mail:
> > For additional commands, e-mail:
> >
> >
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

CONFIDENTIALITY NOTICE: This E-Mail is intended only 
for the use of the individual or entity to which it is addressed and may contain information
that is privileged, confidential and exempt from disclosure under applicable law. If you have
received this communication in error, please do not distribute and delete the original message.
 Please notify the sender by E-Mail at the address shown. Thank you for your compliance.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message