Return-Path: X-Original-To: apmail-lucene-solr-user-archive@minotaur.apache.org Delivered-To: apmail-lucene-solr-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id A9A66103B1 for ; Sun, 21 Apr 2013 23:13:26 +0000 (UTC) Received: (qmail 24174 invoked by uid 500); 21 Apr 2013 23:13:23 -0000 Delivered-To: apmail-lucene-solr-user-archive@lucene.apache.org Received: (qmail 24133 invoked by uid 500); 21 Apr 2013 23:13:23 -0000 Mailing-List: contact solr-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: solr-user@lucene.apache.org Delivered-To: mailing list solr-user@lucene.apache.org Received: (qmail 24124 invoked by uid 99); 21 Apr 2013 23:13:23 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 21 Apr 2013 23:13:23 +0000 X-ASF-Spam-Status: No, hits=2.8 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS,URI_HEX X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of furkankamaci@gmail.com designates 209.85.214.178 as permitted sender) Received: from [209.85.214.178] (HELO mail-ob0-f178.google.com) (209.85.214.178) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 21 Apr 2013 23:13:18 +0000 Received: by mail-ob0-f178.google.com with SMTP id 16so981487obc.9 for ; Sun, 21 Apr 2013 16:12:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:x-received:in-reply-to:references:date:message-id :subject:from:to:content-type; bh=01KydgOOD9zBO44key6dren2I1CcbKjKmMZnpENiCbw=; b=bok2JNJuDu7I3gukR288nIyHTpnJkKh6+LEL2R1i2TuUAZY9olSHnuUt+BEeNVoq2Q WuZii8i1hdzkgnmjH1MkoMyCesU1rJVGO1+u8Mu8hJUhRbeZlrJMdWS/go0D2ULjdcA2 8DsO5KjHzIu3jsnIyw+WRyUEfHwPbWw8ydiWhNn/T/mv9ew5eTLNjx5a7hCHLPboWbQL yKQbcevWM9grFR2hsN0h1LIfW804HL1zvup27MpXL2gZxeySOrbq0nN6uxkK2IpmF6Lq mDmU/bUTp9fju4CX7BrjEcBtM2lDpddv6RApUddnmb4KMeuq8FU39xs5uh+U5YWJ6+uO h0wA== MIME-Version: 1.0 X-Received: by 10.182.97.99 with SMTP id dz3mr8444122obb.71.1366585977986; Sun, 21 Apr 2013 16:12:57 -0700 (PDT) Received: by 10.76.143.163 with HTTP; Sun, 21 Apr 2013 16:12:57 -0700 (PDT) In-Reply-To: <1366181109976-4056599.post@n3.nabble.com> References: <1366181109976-4056599.post@n3.nabble.com> Date: Mon, 22 Apr 2013 02:12:57 +0300 Message-ID: Subject: Re: Pointing to Hbase for Docuements or Directly Saving Documents at Hbase From: Furkan KAMACI To: solr-user@lucene.apache.org Content-Type: multipart/alternative; boundary=047d7b2e4834c5e81104dae7176b X-Virus-Checked: Checked by ClamAV on apache.org --047d7b2e4834c5e81104dae7176b Content-Type: text/plain; charset=ISO-8859-1 All in all is there anything that we can say before measuring the performance comparison of storing the stored values of documents at Hbase? I mean as like: * I will need to communicate with Hbase and this will produce more latency than Lucene * I will loose some built-in functionality that integrates Lucene and Solr * I will loose some good things as like caching at memory with Lucene * bla bla bala.. (These are not true, I just wrote them as an example) Any ideas? 2013/4/17 adfel70 > Any rule of thumb regarding the size of document limitation when storing it > in solr? > > > > Otis Gospodnetic-5 wrote > > Use Solr. It's pretty clear you don't yet have any problems that > > would make you think about alternatives. Using Solr to store and not > > just index will make your life simpler (and your app simpler and > > likely faster). > > > > Otis > > -- > > Solr & ElasticSearch Support > > http://sematext.com/ > > > > > > > > > > > > On Tue, Apr 16, 2013 at 6:31 PM, Furkan KAMACI < > > > furkankamaci@ > > > > wrote: > >> Thanks again for your answer. If I find any document about such > >> comparisons > >> that I would like to read. > >> > >> By the way, is there any advantage for using Lucene instead of anything > >> else as like that: > >> > >> Using Lucene is naturally supported at Solr and if I use anything else I > >> may face with some compatibility problems or communicating issues? > >> > >> > >> 2013/4/17 Otis Gospodnetic < > > > otis.gospodnetic@ > > > > > >> > >>> People do use other data stores to retrieve data sometimes. e.g. Mongo > >>> is popular for that. Like I hinted in another email, I wouldn't > >>> necessarily recommend this for common cases. Don't do it unless you > >>> really know you need it. Otherwise, just store in Solr. > >>> > >>> Otis > >>> -- > >>> Solr & ElasticSearch Support > >>> http://sematext.com/ > >>> > >>> > >>> > >>> > >>> > >>> On Tue, Apr 16, 2013 at 5:32 PM, Furkan KAMACI < > > > furkankamaci@ > > > > > >>> wrote: > >>> > Hi Otis and Jack; > >>> > > >>> > I have made a research about highlights and debugged code. I see that > >>> > highlight are query dependent and not stored. Why Solr uses Lucene > for > >>> > storing text, I mean i.e. content of a web page. Is there any > >>> comparison > >>> > about to store texts at Hbase or any other databases versus Lucene. > >>> > > >>> > Also I want to learn that is there anybody who has used anything else > >>> from > >>> > Lucene to store text of document at our solr user list? > >>> > > >>> > 2013/4/11 Otis Gospodnetic < > > > otis.gospodnetic@ > > > > > >>> > > >>> >> Source code is your best bet. Wiki has info about how to use it, > but > >>> >> not how highlighting is implemented. But you don't need to > >>> understand > >>> >> the implementation details to understand that they are dynamic, > >>> >> computed specifically for each query for each matching document, so > >>> >> you cannot store them anywhere ahead of time. > >>> >> > >>> >> Otis > >>> >> -- > >>> >> Solr & ElasticSearch Support > >>> >> http://sematext.com/ > >>> >> > >>> >> > >>> >> > >>> >> > >>> >> > >>> >> On Thu, Apr 11, 2013 at 11:22 AM, Furkan KAMACI < > > > furkankamaci@ > > > >> > > >>> >> wrote: > >>> >> > Hi Otis; > >>> >> > > >>> >> > It seems that I should read more about highlights. Is there any > >>> where > >>> >> that > >>> >> > explains in detail how highlights are generated at Solr? > >>> >> > > >>> >> > 2013/4/11 Otis Gospodnetic < > > > otis.gospodnetic@ > > > > > >>> >> > > >>> >> >> Hi, > >>> >> >> > >>> >> >> You can't store highlights ahead of time because they are query > >>> >> >> dependent. You could store documents in HBase and use Solr just > >>> for > >>> >> >> indexing. Is that what you want to do? If so, a custom > >>> >> >> SearchComponent executed after QueryComponent could fetch data > >>> from > >>> >> >> external store like HBase. I'm not sure if I'd recommend that. > >>> >> >> > >>> >> >> Otis > >>> >> >> -- > >>> >> >> Solr & ElasticSearch Support > >>> >> >> http://sematext.com/ > >>> >> >> > >>> >> >> > >>> >> >> > >>> >> >> > >>> >> >> > >>> >> >> On Thu, Apr 11, 2013 at 10:01 AM, Furkan KAMACI < > >>> > > > furkankamaci@ > > >>> >> > > >>> >> >> wrote: > >>> >> >> > Actually I don't think to store documents at Solr. I want to > >>> store > >>> >> just > >>> >> >> > highlights (snippets) at Hbase and I want to retrieve them from > >>> Hbase > >>> >> >> when > >>> >> >> > needed. > >>> >> >> > What do you think about separating just highlights from Solr > and > >>> >> storing > >>> >> >> > them into Hbase at Solrclod. By the way if you explain at which > >>> >> process > >>> >> >> and > >>> >> >> > how highlights are genareted at Solr you are welcome. > >>> >> >> > > >>> >> >> > > >>> >> >> > 2013/4/9 Otis Gospodnetic < > > > otis.gospodnetic@ > > > > > >>> >> >> > > >>> >> >> >> You may also be interested in looking at things like solrbase > >>> (on > >>> >> >> Github). > >>> >> >> >> > >>> >> >> >> Otis > >>> >> >> >> -- > >>> >> >> >> Solr & ElasticSearch Support > >>> >> >> >> http://sematext.com/ > >>> >> >> >> > >>> >> >> >> > >>> >> >> >> > >>> >> >> >> > >>> >> >> >> > >>> >> >> >> On Sat, Apr 6, 2013 at 6:01 PM, Furkan KAMACI < > >>> >> > > > furkankamaci@ > > >> > >>> >> >> >> wrote: > >>> >> >> >> > Hi; > >>> >> >> >> > > >>> >> >> >> > First of all should mention that I am new to Solr and making > >>> a > >>> >> >> research > >>> >> >> >> > about it. What I am trying to do that I will crawl some > >>> websites > >>> >> with > >>> >> >> >> Nutch > >>> >> >> >> > and then I will index them with Solr. (Nutch 2.1, > >>> Solr-SolrCloud > >>> >> 4.2 ) > >>> >> >> >> > > >>> >> >> >> > I wonder about something. I have a cloud of machines that > >>> crawls > >>> >> >> websites > >>> >> >> >> > and stores that documents. Then I send that documents into > >>> >> SolrCloud. > >>> >> >> >> Solr > >>> >> >> >> > indexes that documents and generates indexes and save them. > I > >>> know > >>> >> >> that > >>> >> >> >> > from Information Retrieval theory: it *may* not be efficient > >>> to > >>> >> store > >>> >> >> >> > indexes at a NoSQL database (they are something like linked > >>> lists > >>> >> and > >>> >> >> if > >>> >> >> >> > you store them in such kind of database you *may* have a > >>> sparse > >>> >> >> >> > representation -by the way there may be some solutions for > >>> it. > >>> If > >>> >> you > >>> >> >> >> > explain them you are welcome.) > >>> >> >> >> > > >>> >> >> >> > However Solr stores some documents too (i.e. highlights) So > >>> some > >>> >> of my > >>> >> >> >> > documents will be doubled somehow. If I consider that I will > >>> have > >>> >> many > >>> >> >> >> > documents, that dobuled documents may cause a problem for > me. > >>> So is > >>> >> >> there > >>> >> >> >> > any way not storing that documents at Solr and pointing to > >>> them > >>> at > >>> >> >> >> > Hbase(where I save my crawled documents) or instead of > >>> pointing > >>> >> >> directly > >>> >> >> >> > storing them at Hbase (is it efficient or not)? > >>> >> >> >> > >>> >> >> > >>> >> > >>> > > > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Pointing-to-Hbase-for-Docuements-or-Directly-Saving-Documents-at-Hbase-tp4054277p4056599.html > Sent from the Solr - User mailing list archive at Nabble.com. > --047d7b2e4834c5e81104dae7176b--