lucene-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Will Martin" <wmartin...@gmail.com>
Subject RE: How can I make better project than Lucene?
Date Sat, 15 Nov 2014 14:17:25 GMT
Comments inline:

===

-----Original Message-----
From: Siva Thumma [mailto:sivatumma@gmail.com] 
Sent: Saturday, November 15, 2014 8:06 AM
To: general@lucene.apache.org
Subject: Re: How can I make better project than Lucene?

To build such a big product, One would obviously attribute the license. 

Sent from iPhone

> On 15-Nov-2014, at 5:12 pm, Will Martin <wmartinusa@gmail.com> wrote:
> 
> Btw: SwSong should not steal code; which implies an existing license whose terms he is
willing to break. Not a good first step.    ;-)
> 
> will
> 
> -----Original Message-----
> From: Michael McCandless [mailto:lucene@mikemccandless.com]
> Sent: Saturday, November 15, 2014 6:22 AM
> To: general@lucene.apache.org
> Subject: Re: How can I make better project than Lucene?
> 
> Actually I think competing projects is very healthy for open source development.
> 
> There are many things you could explore to "contrast" with Lucene, e.g. write your new
search engine in Go not Java: Java has many problems, maybe Go fixes them.  Go also has a
low-latency garbage collector in development ... and Java's GC options still can't scale to
the heap sizes that are practical now.

:::>wmartin: if there is a problem with GC for a domain then the jdk team should be contacted
or our index design maybe revisited. 


> 
> Lucene has many limitations, so your competing engine could focus on them.  E.g. the
"schemalessness" of Lucene has become a big problem, and near impossible to fix at this point,
and prevents new important features like LUCENE-5879 from being possible, so you could give
your engine a "gentle" schema from the start.

:::> I'm always amazed when I find references to fieldnames...rather than enums or ids.
A scehma should and often does in Lucene, result in an automata or maybe (fst)....so why isn't
the schema implemented as such? Too slow?




> 
> The Lucene Filter/Query situation is a mess: one should extend the other.
> 

:::> um doesn't IntellliJ JetBrains refactor? Is it too dumb?


> Lucene has weak support for proximity queries (SpanQuery is slow and does not get much
attention).
> 

:::> I wrote proximity for CPL (DataTimes, DOW-JONES, LoC, AOL). Give me more information.


> Lucene is showing its age, missing some compelling features like a builtin transaction
log, "core" support for numerics (they are sort of hacked on top), optimistic concurrency
support (sequence ids, versions, something), distributed support (near real time replication,
etc.), multi-tenancy, an example server implementation, so the search servers on top of Lucene
have had to fill these gaps.  Maybe you could make your engine distributed from the start
(Go is a great match for that, from what little I know).
> 
> All 3 highlighter options have problems.
> 

:::> Well since Postings uses a plain-jane tune of BM25raw and uses internediaries to read
posts,  its not surprising. Question is maybe the first thing that should be done Is profile
the damn things.  The DAPO (DAGO) benchmark framework has lucene search and indexing. Maybe
an extension to the search collector there.





> The analysis chain (attributes) is overly complex.
> 
> In your competing engine you can borrow/copy/steal from Lucene's good parts to get started...
> 
> 
> Mike McCandless
> 
> http://blog.mikemccandless.com
> 
> 
>> On Fri, Nov 14, 2014 at 8:43 PM, swsong_dev <swsong_dev@websqrd.com> wrote:
>> I’m developing search engine, Fastcatsearch. http://github 
>> <hthttp://githubtp//github>.com/fastcatsearch/fastcatsearch
>> 
>> Lucene is widely known and famous project and I cannot beat Lucene for now.
>> 
>> But is there any chance to beat Lucene?
>> 
>> Anything like features, performance.
>> 
>> Please, let me know what to do to make better product than Lucene.
>> 
>> Thank you.
> 


Mime
View raw message