lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Furkan KAMACI <furkankam...@gmail.com>
Subject Re: Pointing to Hbase for Docuements or Directly Saving Documents at Hbase
Date Tue, 16 Apr 2013 21:32:10 GMT
Hi Otis and Jack;

I have made a research about highlights and debugged code. I see that
highlight are query dependent and not stored. Why Solr uses Lucene for
storing text, I mean i.e. content of a web page. Is there any comparison
about to store texts at Hbase or any other databases versus Lucene.

Also I want to learn that is there anybody who has used anything else from
Lucene to store text of document at our solr user list?

2013/4/11 Otis Gospodnetic <otis.gospodnetic@gmail.com>

> Source code is your best bet.  Wiki has info about how to use it, but
> not how highlighting is implemented.  But you don't need to understand
> the implementation details to understand that they are dynamic,
> computed specifically for each query for each matching document, so
> you cannot store them anywhere ahead of time.
>
> Otis
> --
> Solr & ElasticSearch Support
> http://sematext.com/
>
>
>
>
>
> On Thu, Apr 11, 2013 at 11:22 AM, Furkan KAMACI <furkankamaci@gmail.com>
> wrote:
> > Hi Otis;
> >
> > It seems that I should read more about highlights. Is there any where
> that
> > explains in detail how highlights are generated at Solr?
> >
> > 2013/4/11 Otis Gospodnetic <otis.gospodnetic@gmail.com>
> >
> >> Hi,
> >>
> >> You can't store highlights ahead of time because they are query
> >> dependent.  You could store documents in HBase and use Solr just for
> >> indexing.  Is that what you want to do?  If so, a custom
> >> SearchComponent executed after QueryComponent could fetch data from
> >> external store like HBase.  I'm not sure if I'd recommend that.
> >>
> >> Otis
> >> --
> >> Solr & ElasticSearch Support
> >> http://sematext.com/
> >>
> >>
> >>
> >>
> >>
> >> On Thu, Apr 11, 2013 at 10:01 AM, Furkan KAMACI <furkankamaci@gmail.com
> >
> >> wrote:
> >> > Actually I don't think to store documents at Solr. I want to store
> just
> >> > highlights (snippets) at Hbase and I want to retrieve them from Hbase
> >> when
> >> > needed.
> >> > What do you think about separating just highlights from Solr and
> storing
> >> > them into Hbase at Solrclod. By the way if you explain at which
> process
> >> and
> >> > how highlights are genareted at Solr you are welcome.
> >> >
> >> >
> >> > 2013/4/9 Otis Gospodnetic <otis.gospodnetic@gmail.com>
> >> >
> >> >> You may also be interested in looking at things like solrbase (on
> >> Github).
> >> >>
> >> >> Otis
> >> >> --
> >> >> Solr & ElasticSearch Support
> >> >> http://sematext.com/
> >> >>
> >> >>
> >> >>
> >> >>
> >> >>
> >> >> On Sat, Apr 6, 2013 at 6:01 PM, Furkan KAMACI <
> furkankamaci@gmail.com>
> >> >> wrote:
> >> >> > Hi;
> >> >> >
> >> >> > First of all should mention that I am new to Solr and making a
> >> research
> >> >> > about it. What I am trying to do that I will crawl some websites
> with
> >> >> Nutch
> >> >> > and then I will index them with Solr. (Nutch 2.1, Solr-SolrCloud
> 4.2 )
> >> >> >
> >> >> > I wonder about something. I have a cloud of machines that crawls
> >> websites
> >> >> > and stores that documents. Then I send that documents into
> SolrCloud.
> >> >> Solr
> >> >> > indexes that documents and generates indexes and save them. I
know
> >> that
> >> >> > from Information Retrieval theory: it *may* not be efficient to
> store
> >> >> > indexes at a NoSQL database (they are something like linked lists
> and
> >> if
> >> >> > you store them in such kind of database you *may* have a sparse
> >> >> > representation -by the way there may be some solutions for it.
If
> you
> >> >> > explain them you are welcome.)
> >> >> >
> >> >> > However Solr stores some documents too (i.e. highlights) So some
> of my
> >> >> > documents will be doubled somehow. If I consider that I will have
> many
> >> >> > documents, that dobuled documents may cause a problem for me.
So is
> >> there
> >> >> > any way not storing that documents at Solr and pointing to them
at
> >> >> > Hbase(where I save my crawled documents) or instead of pointing
> >> directly
> >> >> > storing them at Hbase (is it efficient or not)?
> >> >>
> >>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message