lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Otis Gospodnetic <otis.gospodne...@gmail.com>
Subject Re: Pointing to Hbase for Docuements or Directly Saving Documents at Hbase
Date Tue, 16 Apr 2013 22:18:01 GMT
People do use other data stores to retrieve data sometimes. e.g. Mongo
is popular for that.  Like I hinted in another email, I wouldn't
necessarily recommend this for common cases.  Don't do it unless you
really know you need it.  Otherwise, just store in Solr.

Otis
--
Solr & ElasticSearch Support
http://sematext.com/





On Tue, Apr 16, 2013 at 5:32 PM, Furkan KAMACI <furkankamaci@gmail.com> wrote:
> Hi Otis and Jack;
>
> I have made a research about highlights and debugged code. I see that
> highlight are query dependent and not stored. Why Solr uses Lucene for
> storing text, I mean i.e. content of a web page. Is there any comparison
> about to store texts at Hbase or any other databases versus Lucene.
>
> Also I want to learn that is there anybody who has used anything else from
> Lucene to store text of document at our solr user list?
>
> 2013/4/11 Otis Gospodnetic <otis.gospodnetic@gmail.com>
>
>> Source code is your best bet.  Wiki has info about how to use it, but
>> not how highlighting is implemented.  But you don't need to understand
>> the implementation details to understand that they are dynamic,
>> computed specifically for each query for each matching document, so
>> you cannot store them anywhere ahead of time.
>>
>> Otis
>> --
>> Solr & ElasticSearch Support
>> http://sematext.com/
>>
>>
>>
>>
>>
>> On Thu, Apr 11, 2013 at 11:22 AM, Furkan KAMACI <furkankamaci@gmail.com>
>> wrote:
>> > Hi Otis;
>> >
>> > It seems that I should read more about highlights. Is there any where
>> that
>> > explains in detail how highlights are generated at Solr?
>> >
>> > 2013/4/11 Otis Gospodnetic <otis.gospodnetic@gmail.com>
>> >
>> >> Hi,
>> >>
>> >> You can't store highlights ahead of time because they are query
>> >> dependent.  You could store documents in HBase and use Solr just for
>> >> indexing.  Is that what you want to do?  If so, a custom
>> >> SearchComponent executed after QueryComponent could fetch data from
>> >> external store like HBase.  I'm not sure if I'd recommend that.
>> >>
>> >> Otis
>> >> --
>> >> Solr & ElasticSearch Support
>> >> http://sematext.com/
>> >>
>> >>
>> >>
>> >>
>> >>
>> >> On Thu, Apr 11, 2013 at 10:01 AM, Furkan KAMACI <furkankamaci@gmail.com
>> >
>> >> wrote:
>> >> > Actually I don't think to store documents at Solr. I want to store
>> just
>> >> > highlights (snippets) at Hbase and I want to retrieve them from Hbase
>> >> when
>> >> > needed.
>> >> > What do you think about separating just highlights from Solr and
>> storing
>> >> > them into Hbase at Solrclod. By the way if you explain at which
>> process
>> >> and
>> >> > how highlights are genareted at Solr you are welcome.
>> >> >
>> >> >
>> >> > 2013/4/9 Otis Gospodnetic <otis.gospodnetic@gmail.com>
>> >> >
>> >> >> You may also be interested in looking at things like solrbase (on
>> >> Github).
>> >> >>
>> >> >> Otis
>> >> >> --
>> >> >> Solr & ElasticSearch Support
>> >> >> http://sematext.com/
>> >> >>
>> >> >>
>> >> >>
>> >> >>
>> >> >>
>> >> >> On Sat, Apr 6, 2013 at 6:01 PM, Furkan KAMACI <
>> furkankamaci@gmail.com>
>> >> >> wrote:
>> >> >> > Hi;
>> >> >> >
>> >> >> > First of all should mention that I am new to Solr and making
a
>> >> research
>> >> >> > about it. What I am trying to do that I will crawl some websites
>> with
>> >> >> Nutch
>> >> >> > and then I will index them with Solr. (Nutch 2.1, Solr-SolrCloud
>> 4.2 )
>> >> >> >
>> >> >> > I wonder about something. I have a cloud of machines that
crawls
>> >> websites
>> >> >> > and stores that documents. Then I send that documents into
>> SolrCloud.
>> >> >> Solr
>> >> >> > indexes that documents and generates indexes and save them.
I know
>> >> that
>> >> >> > from Information Retrieval theory: it *may* not be efficient
to
>> store
>> >> >> > indexes at a NoSQL database (they are something like linked
lists
>> and
>> >> if
>> >> >> > you store them in such kind of database you *may* have a sparse
>> >> >> > representation -by the way there may be some solutions for
it. If
>> you
>> >> >> > explain them you are welcome.)
>> >> >> >
>> >> >> > However Solr stores some documents too (i.e. highlights) So
some
>> of my
>> >> >> > documents will be doubled somehow. If I consider that I will
have
>> many
>> >> >> > documents, that dobuled documents may cause a problem for
me. So is
>> >> there
>> >> >> > any way not storing that documents at Solr and pointing to
them at
>> >> >> > Hbase(where I save my crawled documents) or instead of pointing
>> >> directly
>> >> >> > storing them at Hbase (is it efficient or not)?
>> >> >>
>> >>
>>

Mime
View raw message