lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Furkan KAMACI <furkankam...@gmail.com>
Subject Re: Pointing to Hbase for Docuements or Directly Saving Documents at Hbase
Date Tue, 16 Apr 2013 22:31:02 GMT
Thanks again for your answer. If I find any document about such comparisons
that I would like to read.

By the way, is there any advantage for using Lucene instead of anything
else as like that:

Using Lucene is naturally supported at Solr and if I use anything else I
may face with some compatibility problems or communicating issues?


2013/4/17 Otis Gospodnetic <otis.gospodnetic@gmail.com>

> People do use other data stores to retrieve data sometimes. e.g. Mongo
> is popular for that.  Like I hinted in another email, I wouldn't
> necessarily recommend this for common cases.  Don't do it unless you
> really know you need it.  Otherwise, just store in Solr.
>
> Otis
> --
> Solr & ElasticSearch Support
> http://sematext.com/
>
>
>
>
>
> On Tue, Apr 16, 2013 at 5:32 PM, Furkan KAMACI <furkankamaci@gmail.com>
> wrote:
> > Hi Otis and Jack;
> >
> > I have made a research about highlights and debugged code. I see that
> > highlight are query dependent and not stored. Why Solr uses Lucene for
> > storing text, I mean i.e. content of a web page. Is there any comparison
> > about to store texts at Hbase or any other databases versus Lucene.
> >
> > Also I want to learn that is there anybody who has used anything else
> from
> > Lucene to store text of document at our solr user list?
> >
> > 2013/4/11 Otis Gospodnetic <otis.gospodnetic@gmail.com>
> >
> >> Source code is your best bet.  Wiki has info about how to use it, but
> >> not how highlighting is implemented.  But you don't need to understand
> >> the implementation details to understand that they are dynamic,
> >> computed specifically for each query for each matching document, so
> >> you cannot store them anywhere ahead of time.
> >>
> >> Otis
> >> --
> >> Solr & ElasticSearch Support
> >> http://sematext.com/
> >>
> >>
> >>
> >>
> >>
> >> On Thu, Apr 11, 2013 at 11:22 AM, Furkan KAMACI <furkankamaci@gmail.com
> >
> >> wrote:
> >> > Hi Otis;
> >> >
> >> > It seems that I should read more about highlights. Is there any where
> >> that
> >> > explains in detail how highlights are generated at Solr?
> >> >
> >> > 2013/4/11 Otis Gospodnetic <otis.gospodnetic@gmail.com>
> >> >
> >> >> Hi,
> >> >>
> >> >> You can't store highlights ahead of time because they are query
> >> >> dependent.  You could store documents in HBase and use Solr just for
> >> >> indexing.  Is that what you want to do?  If so, a custom
> >> >> SearchComponent executed after QueryComponent could fetch data from
> >> >> external store like HBase.  I'm not sure if I'd recommend that.
> >> >>
> >> >> Otis
> >> >> --
> >> >> Solr & ElasticSearch Support
> >> >> http://sematext.com/
> >> >>
> >> >>
> >> >>
> >> >>
> >> >>
> >> >> On Thu, Apr 11, 2013 at 10:01 AM, Furkan KAMACI <
> furkankamaci@gmail.com
> >> >
> >> >> wrote:
> >> >> > Actually I don't think to store documents at Solr. I want to store
> >> just
> >> >> > highlights (snippets) at Hbase and I want to retrieve them from
> Hbase
> >> >> when
> >> >> > needed.
> >> >> > What do you think about separating just highlights from Solr and
> >> storing
> >> >> > them into Hbase at Solrclod. By the way if you explain at which
> >> process
> >> >> and
> >> >> > how highlights are genareted at Solr you are welcome.
> >> >> >
> >> >> >
> >> >> > 2013/4/9 Otis Gospodnetic <otis.gospodnetic@gmail.com>
> >> >> >
> >> >> >> You may also be interested in looking at things like solrbase
(on
> >> >> Github).
> >> >> >>
> >> >> >> Otis
> >> >> >> --
> >> >> >> Solr & ElasticSearch Support
> >> >> >> http://sematext.com/
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >> On Sat, Apr 6, 2013 at 6:01 PM, Furkan KAMACI <
> >> furkankamaci@gmail.com>
> >> >> >> wrote:
> >> >> >> > Hi;
> >> >> >> >
> >> >> >> > First of all should mention that I am new to Solr and
making a
> >> >> research
> >> >> >> > about it. What I am trying to do that I will crawl some
websites
> >> with
> >> >> >> Nutch
> >> >> >> > and then I will index them with Solr. (Nutch 2.1, Solr-SolrCloud
> >> 4.2 )
> >> >> >> >
> >> >> >> > I wonder about something. I have a cloud of machines
that crawls
> >> >> websites
> >> >> >> > and stores that documents. Then I send that documents
into
> >> SolrCloud.
> >> >> >> Solr
> >> >> >> > indexes that documents and generates indexes and save
them. I
> know
> >> >> that
> >> >> >> > from Information Retrieval theory: it *may* not be efficient
to
> >> store
> >> >> >> > indexes at a NoSQL database (they are something like
linked
> lists
> >> and
> >> >> if
> >> >> >> > you store them in such kind of database you *may* have
a sparse
> >> >> >> > representation -by the way there may be some solutions
for it.
> If
> >> you
> >> >> >> > explain them you are welcome.)
> >> >> >> >
> >> >> >> > However Solr stores some documents too (i.e. highlights)
So some
> >> of my
> >> >> >> > documents will be doubled somehow. If I consider that
I will
> have
> >> many
> >> >> >> > documents, that dobuled documents may cause a problem
for me.
> So is
> >> >> there
> >> >> >> > any way not storing that documents at Solr and pointing
to them
> at
> >> >> >> > Hbase(where I save my crawled documents) or instead of
pointing
> >> >> directly
> >> >> >> > storing them at Hbase (is it efficient or not)?
> >> >> >>
> >> >>
> >>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message