Mailing-List: contact solr-user-help@lucene.apache.org; run by ezmlm
Precedence: bulk
Reply-To: solr-user@lucene.apache.org
Received-SPF: pass (athena.apache.org: domain of furkankamaci@gmail.com
 designates 209.85.214.178 as permitted sender)
MIME-Version: 1.0
In-Reply-To: <1366181109976-4056599.post@n3.nabble.com>
References: 
 <CAHsXEHUOuUG9jxbpc2t3xWhvJXoZe+kfZF_E0Wjz=SQnT5tceQ@mail.gmail.com>
	<CANNBgPKKv8C21bR9sTZcsAUu2o8nsFERgDD3U1mV=yPCSR7n2w@mail.gmail.com>
	<CAHsXEHX3q_gqH2NOx=z8Anwxn-es5OidTebcLtpY73SkrCqhvw@mail.gmail.com>
	<CANNBgPKrQ0TQC-6GDvZ0J79UsVNY=_rJ106f6p8maxYWxvgkfA@mail.gmail.com>
	<CAHsXEHW5tDMU-TXWoGN-R5CJ1UJ=LrQkvyaeXe4+peQtujxJog@mail.gmail.com>
	<CANNBgPKdfTdTMSdH2O-b2Lj8ZpJTku3_aiEEV9p0kyyj4-95aQ@mail.gmail.com>
	<CAHsXEHWU4tBwqm1JOTgrobB6YzJ7TTp45_24SB=oitVRykuQJA@mail.gmail.com>
	<CANNBgPLcoP1WjsyzyznjcBgnc7QyRkZGdVgCnSQduc8Z+HQBqQ@mail.gmail.com>
	<CAHsXEHXYrHdem-8zBCx5821XZ=noRbeTa2H6yjcMj-ZTY=RCuA@mail.gmail.com>
	<CANNBgPKqn_xp0T1tJuuzULcptFOoFczmAxTJ2kv9AH-UqptT-w@mail.gmail.com>
	<1366181109976-4056599.post@n3.nabble.com>
Date: Mon, 22 Apr 2013 02:12:57 +0300
Message-ID: 
 <CAHsXEHX2+z1hDtdJRNnWa_gf8bEK_ctnO8o_34LXGNXt2xPYyw@mail.gmail.com>
Subject: Re: Pointing to Hbase for Docuements or Directly Saving Documents at
 Hbase
From: Furkan KAMACI <furkankamaci@gmail.com>
To: solr-user@lucene.apache.org
Content-Type: multipart/alternative; boundary=047d7b2e4834c5e81104dae7176b

--047d7b2e4834c5e81104dae7176b
Content-Type: text/plain; charset=ISO-8859-1

All in all is there anything that we can say before measuring the
performance comparison of storing the stored values of documents at Hbase?
I mean as like:

* I will need to communicate with Hbase and this will produce more latency
than Lucene
* I will loose some built-in functionality that integrates Lucene and Solr
* I will loose some good things as like caching at memory with Lucene
* bla bla bala..

(These are not true, I just wrote them as an example)

Any ideas?


2013/4/17 adfel70 <adfel70@gmail.com>

> Any rule of thumb regarding the size of document limitation when storing it
> in solr?
>
>
>
> Otis Gospodnetic-5 wrote
> > Use Solr.  It's pretty clear you don't yet have any problems that
> > would make you think about alternatives.  Using Solr to store and not
> > just index will make your life simpler (and your app simpler and
> > likely faster).
> >
> > Otis
> > --
> > Solr & ElasticSearch Support
> > http://sematext.com/
> >
> >
> >
> >
> >
> > On Tue, Apr 16, 2013 at 6:31 PM, Furkan KAMACI &lt;
>
> > furkankamaci@
>
> > &gt; wrote:
> >> Thanks again for your answer. If I find any document about such
> >> comparisons
> >> that I would like to read.
> >>
> >> By the way, is there any advantage for using Lucene instead of anything
> >> else as like that:
> >>
> >> Using Lucene is naturally supported at Solr and if I use anything else I
> >> may face with some compatibility problems or communicating issues?
> >>
> >>
> >> 2013/4/17 Otis Gospodnetic &lt;
>
> > otis.gospodnetic@
>
> > &gt;
> >>
> >>> People do use other data stores to retrieve data sometimes. e.g. Mongo
> >>> is popular for that.  Like I hinted in another email, I wouldn't
> >>> necessarily recommend this for common cases.  Don't do it unless you
> >>> really know you need it.  Otherwise, just store in Solr.
> >>>
> >>> Otis
> >>> --
> >>> Solr & ElasticSearch Support
> >>> http://sematext.com/
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> On Tue, Apr 16, 2013 at 5:32 PM, Furkan KAMACI &lt;
>
> > furkankamaci@
>
> > &gt;
> >>> wrote:
> >>> > Hi Otis and Jack;
> >>> >
> >>> > I have made a research about highlights and debugged code. I see that
> >>> > highlight are query dependent and not stored. Why Solr uses Lucene
> for
> >>> > storing text, I mean i.e. content of a web page. Is there any
> >>> comparison
> >>> > about to store texts at Hbase or any other databases versus Lucene.
> >>> >
> >>> > Also I want to learn that is there anybody who has used anything else
> >>> from
> >>> > Lucene to store text of document at our solr user list?
> >>> >
> >>> > 2013/4/11 Otis Gospodnetic &lt;
>
> > otis.gospodnetic@
>
> > &gt;
> >>> >
> >>> >> Source code is your best bet.  Wiki has info about how to use it,
> but
> >>> >> not how highlighting is implemented.  But you don't need to
> >>> understand
> >>> >> the implementation details to understand that they are dynamic,
> >>> >> computed specifically for each query for each matching document, so
> >>> >> you cannot store them anywhere ahead of time.
> >>> >>
> >>> >> Otis
> >>> >> --
> >>> >> Solr & ElasticSearch Support
> >>> >> http://sematext.com/
> >>> >>
> >>> >>
> >>> >>
> >>> >>
> >>> >>
> >>> >> On Thu, Apr 11, 2013 at 11:22 AM, Furkan KAMACI &lt;
>
> > furkankamaci@
>
> > &gt;> >
> >>> >> wrote:
> >>> >> > Hi Otis;
> >>> >> >
> >>> >> > It seems that I should read more about highlights. Is there any
> >>> where
> >>> >> that
> >>> >> > explains in detail how highlights are generated at Solr?
> >>> >> >
> >>> >> > 2013/4/11 Otis Gospodnetic &lt;
>
> > otis.gospodnetic@
>
> > &gt;
> >>> >> >
> >>> >> >> Hi,
> >>> >> >>
> >>> >> >> You can't store highlights ahead of time because they are query
> >>> >> >> dependent.  You could store documents in HBase and use Solr just
> >>> for
> >>> >> >> indexing.  Is that what you want to do?  If so, a custom
> >>> >> >> SearchComponent executed after QueryComponent could fetch data
> >>> from
> >>> >> >> external store like HBase.  I'm not sure if I'd recommend that.
> >>> >> >>
> >>> >> >> Otis
> >>> >> >> --
> >>> >> >> Solr & ElasticSearch Support
> >>> >> >> http://sematext.com/
> >>> >> >>
> >>> >> >>
> >>> >> >>
> >>> >> >>
> >>> >> >>
> >>> >> >> On Thu, Apr 11, 2013 at 10:01 AM, Furkan KAMACI <
> >>>
>
> > furkankamaci@
>
> >>> >> >
> >>> >> >> wrote:
> >>> >> >> > Actually I don't think to store documents at Solr. I want to
> >>> store
> >>> >> just
> >>> >> >> > highlights (snippets) at Hbase and I want to retrieve them from
> >>> Hbase
> >>> >> >> when
> >>> >> >> > needed.
> >>> >> >> > What do you think about separating just highlights from Solr
> and
> >>> >> storing
> >>> >> >> > them into Hbase at Solrclod. By the way if you explain at which
> >>> >> process
> >>> >> >> and
> >>> >> >> > how highlights are genareted at Solr you are welcome.
> >>> >> >> >
> >>> >> >> >
> >>> >> >> > 2013/4/9 Otis Gospodnetic &lt;
>
> > otis.gospodnetic@
>
> > &gt;
> >>> >> >> >
> >>> >> >> >> You may also be interested in looking at things like solrbase
> >>> (on
> >>> >> >> Github).
> >>> >> >> >>
> >>> >> >> >> Otis
> >>> >> >> >> --
> >>> >> >> >> Solr & ElasticSearch Support
> >>> >> >> >> http://sematext.com/
> >>> >> >> >>
> >>> >> >> >>
> >>> >> >> >>
> >>> >> >> >>
> >>> >> >> >>
> >>> >> >> >> On Sat, Apr 6, 2013 at 6:01 PM, Furkan KAMACI <
> >>> >>
>
> > furkankamaci@
>
> >>
> >>> >> >> >> wrote:
> >>> >> >> >> > Hi;
> >>> >> >> >> >
> >>> >> >> >> > First of all should mention that I am new to Solr and making
> >>> a
> >>> >> >> research
> >>> >> >> >> > about it. What I am trying to do that I will crawl some
> >>> websites
> >>> >> with
> >>> >> >> >> Nutch
> >>> >> >> >> > and then I will index them with Solr. (Nutch 2.1,
> >>> Solr-SolrCloud
> >>> >> 4.2 )
> >>> >> >> >> >
> >>> >> >> >> > I wonder about something. I have a cloud of machines that
> >>> crawls
> >>> >> >> websites
> >>> >> >> >> > and stores that documents. Then I send that documents into
> >>> >> SolrCloud.
> >>> >> >> >> Solr
> >>> >> >> >> > indexes that documents and generates indexes and save them.
> I
> >>> know
> >>> >> >> that
> >>> >> >> >> > from Information Retrieval theory: it *may* not be efficient
> >>> to
> >>> >> store
> >>> >> >> >> > indexes at a NoSQL database (they are something like linked
> >>> lists
> >>> >> and
> >>> >> >> if
> >>> >> >> >> > you store them in such kind of database you *may* have a
> >>> sparse
> >>> >> >> >> > representation -by the way there may be some solutions for
> >>> it.
> >>> If
> >>> >> you
> >>> >> >> >> > explain them you are welcome.)
> >>> >> >> >> >
> >>> >> >> >> > However Solr stores some documents too (i.e. highlights) So
> >>> some
> >>> >> of my
> >>> >> >> >> > documents will be doubled somehow. If I consider that I will
> >>> have
> >>> >> many
> >>> >> >> >> > documents, that dobuled documents may cause a problem for
> me.
> >>> So is
> >>> >> >> there
> >>> >> >> >> > any way not storing that documents at Solr and pointing to
> >>> them
> >>> at
> >>> >> >> >> > Hbase(where I save my crawled documents) or instead of
> >>> pointing
> >>> >> >> directly
> >>> >> >> >> > storing them at Hbase (is it efficient or not)?
> >>> >> >> >>
> >>> >> >>
> >>> >>
> >>>
>
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Pointing-to-Hbase-for-Docuements-or-Directly-Saving-Documents-at-Hbase-tp4054277p4056599.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

--047d7b2e4834c5e81104dae7176b--