lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lance Norskog <goks...@gmail.com>
Subject Re: Flow Chart of Solr
Date Mon, 08 Apr 2013 01:36:58 GMT
Seconded. Single-stepping really is the best way to follow the logic 
chains and see how the data mutates.

On 04/05/2013 06:36 AM, Erick Erickson wrote:
> Then there's my lazy method. Fire up the IDE and find a test case that
> looks close to something you want to understand further. Step through
> it all in the debugger. I admit there'll be some fumbling at the start
> to _find_ the test case, but they're pretty well named. In IntelliJ,
> all you have to do is right-click on the test case and the context
> menu says "debug blahbalbhabl".... You can chart the class
> relationships you actually wind up in as you go. This seems tedious,
> but it saves me getting lost in the class hierarchy.
>
> Also, there are some convenient tools in the IDE that will show you
> class hierarchies as you need.
>
> Or attach your debugger to a running Solr, which is actually very
> easy. In IntelliJ (and Eclipse has something very similar), create a
> "remote" project. That'll specify some parameters you start up with,
> e.g.:
> java -Xdebug -Xrunjdwp:transport=dt_socket,server=y,suspend=y,address=5900
> -jar start.jar
>
> Now start up the remote debugging session you just created in the IDE
> and you are attached to a live solr instance and able to step through
> any code you want.
>
> Either way, you can make the IDE work for you!
>
> FWIW,
> Erick
>
> On Wed, Apr 3, 2013 at 12:03 PM, Jack Krupansky <jack@basetechnology.com> wrote:
>> We're using the 4.x branch code as the basis for our writing. So,
>> effectively it will be for at least 4.3 when the book comes out in the
>> summer.
>>
>> Early access will be in about a month or so. O'Reilly will be showing a
>> galley proof for 200 pages of the book next week at Big Data TechCon next
>> week in Boston.
>>
>>
>> -- Jack Krupansky
>>
>> -----Original Message----- From: Jack Park
>> Sent: Wednesday, April 03, 2013 12:56 PM
>>
>> To: solr-user@lucene.apache.org
>> Subject: Re: Flow Chart of Solr
>>
>> Jack,
>>
>> Is that new book up to the 4.+ series?
>>
>> Thanks
>> The other Jack
>>
>> On Wed, Apr 3, 2013 at 9:19 AM, Jack Krupansky <jack@basetechnology.com>
>> wrote:
>>> And another one on the way:
>>>
>>> http://www.amazon.com/Lucene-Solr-Definitive-comprehensive-realtime/dp/1449359957
>>>
>>> Hopefully that help a lot as well. Plenty of diagrams. Lots of examples.
>>>
>>> -- Jack Krupansky
>>>
>>> -----Original Message----- From: Jack Park
>>> Sent: Wednesday, April 03, 2013 11:25 AM
>>>
>>> To: solr-user@lucene.apache.org
>>> Subject: Re: Flow Chart of Solr
>>>
>>> There are three books on Solr, two with that in the title, and one,
>>> Taming Text, each of which have been very valuable in understanding
>>> Solr.
>>>
>>> Jack
>>>
>>> On Wed, Apr 3, 2013 at 5:25 AM, Jack Krupansky <jack@basetechnology.com>
>>> wrote:
>>>>
>>>> Sure, yes. But... it comes down to what level of detail you want and need
>>>> for a specific task. In other words, there are probably a dozen or more
>>>> levels of detail. The reality is that if you are going to work at the
>>>> Solr
>>>> code level, that is very, very different than being a "user" of Solr, and
>>>> at
>>>> that point your first step is to become familiar with the code itself.
>>>>
>>>> When you talk about "parsing" and "stemming", you are really talking
>>>> about
>>>> the user-level, not the Solr code level. Maybe what you really need is a
>>>> cheat sheet that maps a user-visible feature to the main Solr code
>>>> component
>>>> for that implements that user feature.
>>>>
>>>> There are a number of different forms of "parsing" in Solr - parsing of
>>>> what? Queries? Requests? Solr documents? Function queries?
>>>>
>>>> Stemming? Well, in truth, Solr doesn't even do stemming - Lucene does
>>>> that.
>>>> Lucene does all of the "token filtering". Are you asking for details on
>>>> how
>>>> Lucene works? Maybe you meant to ask how "term analysis" works, which is
>>>> split between Solr and Lucene. Or maybe you simply wanted to know when
>>>> and
>>>> where term analysis is done. Tell us your specific problem or specific
>>>> question and we can probably quickly give you an answer.
>>>>
>>>> In truth, NOBODY uses "flow charts" anymore. Sure, there are some
>>>> user-level
>>>> diagrams, but not down to the code level.
>>>>
>>>> If you could focus on specific questions, we could give you specific
>>>> answers.
>>>>
>>>> "Main steps"? That depends on what level you are working at. Tell us what
>>>> problem you are trying to solve and we can point you to the relevant
>>>> areas.
>>>>
>>>> In truth, if you become generally familiar with Solr at the user level
>>>> (study the wikis), you will already know what the "main steps" are.
>>>>
>>>> So, it is not "main steps of Solr", but main steps of some specific
>>>> "request" of Solr, and for a specified level of detail, and for a
>>>> specified
>>>> area of Solr if greater detail is needed. Be more specific, and then we
>>>> can
>>>> be more specific.
>>>>
>>>> For now, the general advice for people who need or want to go far beyond
>>>> the
>>>> user level is to "get familiar with the code" - just LOOK at it - a lot
>>>> of
>>>> the package and class names are OBVIOUS, really, and follow the class
>>>> hierarchy and code flow using the standard features of any modern Java
>>>> IDE.
>>>> If you are wondering where to start for some specific user-level feature,
>>>> please ask specifically about that feature. But... make a diligent effort
>>>> to
>>>> discover and learn on your own before asking open-ended questions.
>>>>
>>>> Sure, there are lots of things in Lucene and Solr that are rather complex
>>>> and seemingly convoluted, and not obvious, but people are more than
>>>> willing
>>>> to help you out if you simply ask a specific question. I mean, not
>>>> everybody
>>>> needs to know the fine detail of query parsing, analysis, building a
>>>> Lucene-level stemmer, etc. If we tried to put all of that in a diagram,
>>>> most
>>>> people would be more confused than enlightened.
>>>>
>>>> At which step are scores calculated? That's more of a Lucene question.
>>>> Or,
>>>> are you really asking what code in Solr invokes Lucene search methods
>>>> that
>>>> calculate basic scores?
>>>>
>>>> In short, you need to be more specific. Don't force us to guess what
>>>> problem
>>>> you are trying to solve.
>>>>
>>>> -- Jack Krupansky
>>>>
>>>> -----Original Message----- From: Furkan KAMACI
>>>> Sent: Wednesday, April 03, 2013 6:52 AM
>>>> To: solr-user@lucene.apache.org
>>>> Subject: Re: Flow Chart of Solr
>>>>
>>>>
>>>> So, all in all, is there anybody who can write down just main steps of
>>>> Solr(including parsing, stemming etc.)?
>>>>
>>>>
>>>> 2013/4/2 Furkan KAMACI <furkankamaci@gmail.com>
>>>>
>>>>> I think about myself as an example. I have started to make research
>>>>> about
>>>>> Solr just for some weeks. I have learned Solr and its related projects.
>>>>> My
>>>>> next step writing down the main steps Solr. We have separated learning
>>>>> curve of Solr into two main categories.
>>>>> First one is who are using it as out of the box components. Second one
>>>>> is
>>>>> developer side.
>>>>>
>>>>> Actually developer side branches into two way.
>>>>>
>>>>> First one is general steps of it. i.e. document comes into Solr (i.e.
>>>>> crawled data of Nutch). which analyzing processes are going to done
>>>>> (stamming, hamming etc.), what will be doing after parsing step by step.
>>>>> When a search query happens what happens step by step, at which step
>>>>> scores
>>>>> are calculated so on so forth.
>>>>> Second one is more code specific i.e. which handlers takes into account
>>>>> data that will going to be indexed(no need the explain every handler
at
>>>>> this step) . Which are the analyzer, tokenizer classes and what are the
>>>>> flow between them. How response handlers works and what are they.
>>>>>
>>>>> Also explaining about cloud side is other work.
>>>>>
>>>>> Some of explanations are currently presents at wiki (but some of them
>>>>> are
>>>>> at very deep places at wiki and it is not easy to find the parent topic
>>>>> of
>>>>> it, maybe starting wiki from a top age and branching all other topics
as
>>>>> possible as from it could be better)
>>>>>
>>>>> If we could show the big picture, and beside of it the smaller pictures
>>>>> within it, it would be great (if you know the main parts it will be easy
>>>>> to
>>>>> go deep into the code i.e. you don't need to explain every handler, if
>>>>> you
>>>>> show the way to the developer he/she could debug and find the needs)
>>>>>
>>>>> When I think about myself as an example, I have to write down the steps
>>>>> of
>>>>> Solr a bit detail  even I read many pages at wiki and a book about it,
I
>>>>> see that it is not easy even writing down the big picture of developer
>>>>> side.
>>>>>
>>>>>
>>>>> 2013/4/2 Alexandre Rafalovitch <arafalov@gmail.com>
>>>>>
>>>>>> Yago,
>>>>>>
>>>>>> My point - perhaps lost in too much text - was that Solr is presented
-
>>>>>> and
>>>>>> can function - as a black-box. Which makes it different from more
>>>>>> traditional open-source project. So, the stage-2 happens exactly
when
>>>>>> the
>>>>>> non-programmers have to cross the boundary from the black-box into
>>>>>> code-first approach and the hand-off is not particularly smooth.
Or
>>>>>> even
>>>>>> when - say - php or .Net programmer  tries to get beyond the basic
>>>>>> operations their client library and has the understand the server-side
>>>>>> aspects of Solr.
>>>>>>
>>>>>> Regards,
>>>>>>     Alex.
>>>>>>
>>>>>> On Tue, Apr 2, 2013 at 1:19 PM, Yago Riveiro <yago.riveiro@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Alexandre,
>>>>>>>
>>>>>>> You describe the normal path when a beginner try to use a source
of >
>>>>>>> code
>>>>>>> that doesn't understand, black-box, reading code, hacking, ok
now I >
>>>>>>> know
>>>>>>> 10% of the project, with lucky :p.
>>>>>>>
>>>>>>
>>>>>> Personal blog: http://blog.outerthoughts.com/
>>>>>> LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
>>>>>> - Time is the quality of nature that keeps events from happening
all at
>>>>>> once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
>>>>>> book)
>>>>>>
>>>>>


Mime
View raw message