lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Furkan KAMACI <furkankam...@gmail.com>
Subject Re: Pointing to Hbase for Docuements or Directly Saving Documents at Hbase
Date Thu, 11 Apr 2013 14:01:32 GMT
Actually I don't think to store documents at Solr. I want to store just
highlights (snippets) at Hbase and I want to retrieve them from Hbase when
needed.
What do you think about separating just highlights from Solr and storing
them into Hbase at Solrclod. By the way if you explain at which process and
how highlights are genareted at Solr you are welcome.


2013/4/9 Otis Gospodnetic <otis.gospodnetic@gmail.com>

> You may also be interested in looking at things like solrbase (on Github).
>
> Otis
> --
> Solr & ElasticSearch Support
> http://sematext.com/
>
>
>
>
>
> On Sat, Apr 6, 2013 at 6:01 PM, Furkan KAMACI <furkankamaci@gmail.com>
> wrote:
> > Hi;
> >
> > First of all should mention that I am new to Solr and making a research
> > about it. What I am trying to do that I will crawl some websites with
> Nutch
> > and then I will index them with Solr. (Nutch 2.1, Solr-SolrCloud 4.2 )
> >
> > I wonder about something. I have a cloud of machines that crawls websites
> > and stores that documents. Then I send that documents into SolrCloud.
> Solr
> > indexes that documents and generates indexes and save them. I know that
> > from Information Retrieval theory: it *may* not be efficient to store
> > indexes at a NoSQL database (they are something like linked lists and if
> > you store them in such kind of database you *may* have a sparse
> > representation -by the way there may be some solutions for it. If you
> > explain them you are welcome.)
> >
> > However Solr stores some documents too (i.e. highlights) So some of my
> > documents will be doubled somehow. If I consider that I will have many
> > documents, that dobuled documents may cause a problem for me. So is there
> > any way not storing that documents at Solr and pointing to them at
> > Hbase(where I save my crawled documents) or instead of pointing directly
> > storing them at Hbase (is it efficient or not)?
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message