lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From nikhil desai <niksde...@gmail.com>
Subject Re: Lucene Indexes explanantion
Date Tue, 11 Jun 2013 00:36:50 GMT
I don't think I could get much from what you said, could you please
elaborate? Appreciate.

On Mon, Jun 10, 2013 at 5:20 PM, Jack Krupansky <jack@basetechnology.com>wrote:

> Your stored value could be very different from your indexed (searchable)
> value. You can also associate payloads with an indexed term. And there are
> DocValues as well.
>
>
> -- Jack Krupansky
>
> -----Original Message----- From: nikhil desai
> Sent: Monday, June 10, 2013 8:06 PM
> To: java-user@lucene.apache.org
> Subject: Re: Lucene Indexes explanantion
>
>
> Sure. Thanks Jack.
> I don't have much experience working with Lucene, however, here is what I
> am trying to resolve.
>
> I learned that the Custom attributes cannot be used for indexing or
> searching purposes. However I wanted the attributes to be used for indexing
> and searching. So I created custom attributes and inserted them as tokens
> into the tokenstream by assigning positionIncrement attribute to 0. Now
> since my new token stream has attributes(as tokens) and they are used while
> indexing, I can now search the document based on the attributes(tokens I
> newly inserted). However I still have an issue. And by the way I have a lot
> of attributes that I need to assign to an individual token.
>
> Ex: Sentence: "LinkedIn is famous"
> After passing through custom analyzer and few filters that I have written
> and appending Attributes to the tokens, the new Tokenstream we get is
> "LinkedIn Noun SocialSite famous JJ Positive" - (what that means is that
> LinkedIn is Noun and is also an Socialsite, famous is an adjective and also
> a Positive word, 'is' is removed as it does not make sense to index 'is')
>
> This is now definitely searchable based on Attributes(Here: Noun,
> SocialSite, JJ, Positive).
>
> However, since I have put this entire text "LinkedIn is famous" as a Field
> while adding a Document, when I search for say "SocialSite", I get a
> Document as an output which has "LinkedIn is famous" as one of the fields.
>
> However, is it possible to get only "LinkedIn" as output rather than an
> entire text? i.e Only the actual token(the token present in the original
> input) as output?
> Another example: if I search for say "Positive" I should get "famous" as
> output and not the entire "LinkedIn is famous".
>
> I know that if I put it as a Field in the document, I should be able to get
> it, but how do I add such a Field? because, only when the Tokens are passed
> through the filters we get to know what all Attributes would be attached to
> it, so while we do indexwriter.addDocument() we have no idea about the
> Attributes.
>
> The typical problem that I see is the indexing is done based on the new
> tokenstream which is good, but when it retrieves the Document, it has the
> older actual Tokenstream(or actual input) and that is what is given as
> output.
>
> Does that make any sense? Or I have a typical use case that does not go
> well with Lucene?
>
> Any help comments are appreciated.
>
> On Mon, Jun 10, 2013 at 1:32 PM, Jack Krupansky <jack@basetechnology.com>*
> *wrote:
>
>  Even though you've posted for Lucene, you might want to consider taking a
>> look at Solr because Solr has an Admin UI with an Analysis page which
>> gives
>> you a nice display of how index and query text is analyzed into tokens,
>> terms, and attributes - all of which Solr inherits from Lucene.
>>
>> And check out the unit tests for Lucene (and Solr) for indexing. Then you
>> can actually step through code and see it happen.
>>
>> Otherwise, google for blogs on various sub-topics of interest with
>> specific terms.
>>
>> OTOH... don't try diving too deeply until you've written and understood a
>> fair amount of Java code using Lucene. Otherwise, you won't have enough
>> context to understand or even ask intelligent questions.
>>
>> -- Jack Krupansky
>>
>> -----Original Message----- From: nikhil desai
>> Sent: Monday, June 10, 2013 1:24 PM
>> To: java-user@lucene.apache.org
>> Subject: Lucene Indexes explanantion
>>
>>
>> Hello,
>>
>> My first time post in this group.
>>
>> I have been using Lucene recently. I have a question.
>>
>> Where can I find a good explanation on Indexes. Or rather how indexing
>> (Not
>> really the mathematical aspect) happens in Lucene, what all
>> attributes(charTerm, Offset etc) come into play? And the way it is
>> implemented? I checked the "Lucene In Action" and could not find much on
>> actual indexing, what all classes etc are being used.
>>
>> Appreciate your help.
>>
>> Thanks
>> NIKHIL
>>
>> ------------------------------****----------------------------**
>> --**---------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.****apache.org<
>> java-user-**unsubscribe@lucene.apache.org<java-user-unsubscribe@lucene.apache.org>
>> >
>> For additional commands, e-mail: java-user-help@lucene.apache.****org<
>> java-user-help@lucene.**apache.org <java-user-help@lucene.apache.org>>
>>
>>
>>
>
> ------------------------------**------------------------------**---------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.**apache.org<java-user-unsubscribe@lucene.apache.org>
> For additional commands, e-mail: java-user-help@lucene.apache.**org<java-user-help@lucene.apache.org>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message