lucene-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erol Akarsu <eaka...@gmail.com>
Subject Re: How can I make better project than Lucene?
Date Tue, 18 Nov 2014 18:40:22 GMT
Erik,

Can you pleas answer my question in SOLR list?
I appreciate your help

I have several field types and like to assign correct boosting so that I
will get results in correct order.
Here is a summary of what I have:
1- Product Title - text field , Boost = 160
2- Product Description - text field  , Boost = 80
3-Number of clicks - Integer field, having value [1 TO 1000] , Boost = 40
4- Product Features - text field , Boost = 20
5- AmountPurchased - Float field , Boost = 10
5- Product Properties - text field , Boost = 5

User will make a search q= "foo bar" and we expect solr will return results
based on Boost values assigned above. qf and pf can help me to assign
boosting for text fields easily. But I am having difficulty to mix text
fields with numeric ones. For example, I want product with Number of clicks
= 20 should be listed higher than one with 10 clicks after 1) and 2).

I guess solr, based on search results, will re order based boost values in
text fields but I want product with number of clicks 10 will be higher than
with clicks 5. As result, any products having clicks will have higher ranks
that products that has features that includes search keywords.

I hope I have explained correctly,

Can you please guide me on how to solve this issue?

Regards

Erol Akarsu
Remove Ads
<http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=buy_credits_page>

On Sat, Nov 15, 2014 at 7:31 PM, Erik Hatcher <erik.hatcher@gmail.com>
wrote:

> I’m curious to see these benchmarks run on the latest Solr version, as the
> numbers you quoted were over two years ago.  Also, it’d be useful to see
> the indexing and searching benchmark code to make sure it takes advantage
> of best practices.   I’ve indexed 10M docs into Solr in only a few
> minutes.  500K, say in CSV format, for basic e-commerce product data, would
> likely take a minute or so.  The searching differences you present seem
> fairly negligible - 100ms is the blink of an eye, so anything under is
> considered quite acceptable by the largest e-commerce vendors in the
> world.  Along with that, perhaps an even more important benchmark is
> relevancy or in some way measure how good the search results are.
>
> As Mike put so well, competition is a great thing so by all means I
> encourage you to carry on with your endeavor.  Sounds like you’ve built
> some powerful stuff and have extensive experience.  +1
>
>         Erik
>
>
>
> On Nov 15, 2014, at 6:23 PM, swsong_dev <swsong_dev@websqrd.com> wrote:
>
> > Thank you for your sincere reply, Mr. McCandless.
> >
> > When I posted an email in a mailing-list, I was afraid for not getting a
> considerable reply, but I’m now so glad I might find a way.
> >
> > I agree that a new search engine in Go would be competitive. I think we
> all need a next generation search engine core redesigned from the start.
> >
> > And, I understand Lucene’s limitations you mentioned. They are good
> points to get started.
> >
> > I have been used a search engine for first 4 years and developed a
> search engines for last 6 years from the bottom, and I got feedback “It’s
> faster than Solr in indexing and searching”. (
> http://ddakker.tistory.com/248 <http://ddakker.tistory.com/248>)
> >
> > ===Result===
> > Data size : 529,188
> > Fastcat indexing time : 1m 26s
> > Solr3.5 indexing time : 5m 30s
> > Fastcat searching time : 48ms
> > Solr3.5 searching time : 73ms
> >
> > It applied to Korea’s greatest shopping service(http://danawa.com/ <
> http://danawa.com/>) a month ago to my delight.
> >
> > But my goal has been making a globally-used open source search engine.
> >
> > As you suggested, now I want to make a whole-new search engine in Go.
> >
> > I have made my first search engine alone, but I would not make a new
> search engine alone. I want to make it with global developers together.
> >
> > If you plan to make a new search engine in Go, or know someone around
> you, could you help me gathering members for a new search engine, and guide
> us technically(feature requirement, efficient design)?
> >
> > Or if there is already a new search engine project in Go, could you let
> me know?
> >
> > In Korea, no one develops a search engine except people who work at a
> search engine solution company, and even they are very few and do not spend
> time to an open source project.
> >
> > In my case, I found a tiny venture company for making time to develop an
> open source search engine 4 year ago.
> >
> > I want to be involved in a next-generation search engine project. I
> would be happy to make a new search engine itself.
> >
> > Your little help could be great for me.
> >
> > Thank you.
> >
> > Sang Song
> >
> >
> >> 2014. 11. 15., 오후 8:22, Michael McCandless <lucene@mikemccandless.com>
> 작성:
> >>
> >> Actually I think competing projects is very healthy for open source
> development.
> >>
> >> There are many things you could explore to "contrast" with Lucene,
> >> e.g. write your new search engine in Go not Java: Java has many
> >> problems, maybe Go fixes them.  Go also has a low-latency garbage
> >> collector in development ... and Java's GC options still can't scale
> >> to the heap sizes that are practical now.
> >>
> >> Lucene has many limitations, so your competing engine could focus on
> >> them.  E.g. the "schemalessness" of Lucene has become a big problem,
> >> and near impossible to fix at this point, and prevents new important
> >> features like LUCENE-5879 from being possible, so you could give your
> >> engine a "gentle" schema from the start.
> >>
> >> The Lucene Filter/Query situation is a mess: one should extend the
> other.
> >>
> >> Lucene has weak support for proximity queries (SpanQuery is slow and
> >> does not get much attention).
> >>
> >> Lucene is showing its age, missing some compelling features like a
> >> builtin transaction log, "core" support for numerics (they are sort of
> >> hacked on top), optimistic concurrency support (sequence ids,
> >> versions, something), distributed support (near real time replication,
> >> etc.), multi-tenancy, an example server implementation, so the search
> >> servers on top of Lucene have had to fill these gaps.  Maybe you could
> >> make your engine distributed from the start (Go is a great match for
> >> that, from what little I know).
> >>
> >> All 3 highlighter options have problems.
> >>
> >> The analysis chain (attributes) is overly complex.
> >>
> >> In your competing engine you can borrow/copy/steal from Lucene's good
> >> parts to get started...
> >>
> >>
> >> Mike McCandless
> >>
> >> http://blog.mikemccandless.com
> >>
> >>
> >> On Fri, Nov 14, 2014 at 8:43 PM, swsong_dev <swsong_dev@websqrd.com>
> wrote:
> >>> I’m developing search engine, Fastcatsearch. http://github
> <hthttp://githubtp//github>.com/fastcatsearch/fastcatsearch
> >>>
> >>> Lucene is widely known and famous project and I cannot beat Lucene for
> now.
> >>>
> >>> But is there any chance to beat Lucene?
> >>>
> >>> Anything like features, performance.
> >>>
> >>> Please, let me know what to do to make better product than Lucene.
> >>>
> >>> Thank you.
> >
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message