lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nader Henein <...@bayt.net>
Subject Re: index question
Date Mon, 27 Dec 2004 11:36:29 GMT
ok, so you can index the whole document in one shot, but you should 
store certain fields like what you display in the search results in the 
index to avoid a round trip to the DB.

so for example you would store "title" "synopsis" "link" "doc_id" "date" 
and then just index what you want to be searchable, the reason why you 
would have title stored in one field and indexed again in another so if 
you stem that field it will become useless for display purposes.  So the 
logical representation of your index would look something like this:

<document>
    <id> stored/ indexed
    <title> stored/ un-indexed
    <synopsis> stored/ un-indexed
    <date> stored / indexed
    <full document stemmed>  indexed / un stored
</document>

Enjoy

Nader Henein


Daniel Cortes wrote:

> thks nader
> I need a general search of documents, it's for this that I ask yours 
> recomendations, because fields are only for info in the search. 
> Tipically search on Google for example
>
> search:casa
>
> La casa roja
> ..habĂ­a una vez una casa roja que tenia ....
> htttp:\\go.to\casa    Modification date:25-12-04
>
> for do this  what fields and options (keybord,text,unindex,unstored) 
> do you should use?
>
> thks
>
> Nader Henein wrote:
>
>> It comes down to your searching needs, do you need to have your 
>> documents searcheable by these fields or do you need a general search 
>> of the whole document, your decisions will impact the size of the 
>> index and the speed of indexing and searching so give it due thought, 
>> start from your GUI requirement and design the index that responds to 
>> your user needs best.
>>
>> Nader
>>
>> Daniel Cortes wrote:
>>
>>> I want to know In the case that you use Lucene for index files how a 
>>> general searcher, what fields (or keys) do you use to index.
>>> For example, in my case are html,pdf,doc,ppt and txt and I'm thinked 
>>> to use Field Autor, Field title, field url, field content, field 
>>> modification date.
>>> Something more? some recommendation?
>>> thks
>>> and Merry Xmas for all.
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>>> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>>>
>>>
>>>
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>
>
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message