lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Derk Crezee" <derk.cre...@gmail.com>
Subject Re: Scoped Search and Facets generation using Lucene
Date Tue, 25 Nov 2008 13:56:55 GMT
I remember seeing a paper about indexing xml using Lucene here:
https://ssl.bnt.com/idealliance/papers/xmle02/dx_xmle02/html/abstract/03-02-08.html

I think it will be applicable to your problem.

- Derk

On Mon, Nov 17, 2008 at 4:06 PM, Aleksander M. Stensby <
aleksander.stensby@integrasco.no> wrote:

> Yes, you have a lot of fields but if you want to do faceting and all
> possible fields you will probably have to index each node in your document
> as a separate field. You should look at solr, which supports facets out of
> the box. Keep in mind what type of tokenizer(s) you use as this effects your
> results when querying.
>
> - Aleksander
>
>
> On Mon, 17 Nov 2008 13:49:51 +0100, Bapat, Mayur <mbapat@ptc.com> wrote:
>
>  My xml format similar to as follows -
>>
>> <?xml version="1.0" encoding="UTF-8" ?><Attributes><IBA8909
>> type="float">0.0</IBA8909><IBA8654 type="float">0.0</IBA8654><folder
>> type="double">8254</folder><inheritedDomain
>> type="string">true</inheritedDomain><iterationInfo.branchId
>> type="double">8438</iterationInfo.branchId><Part
>> type="double">12771</Part><member
>> type="double">8443</member><IBA8608
>> type="string">Ver-dir(TOP)</IBA8608><IBA9119
>> type="float">0.0</IBA9119><genericType
>> type="string">standard</genericType><name
>> type="string">A001</name><versionInfo.identifier.versionId
>> type="string">A</versionInfo.identifier.versionId><thePersistInfo.update
>> Count
>> type="double">3</thePersistInfo.updateCount><IBA8929
>> type="string">Carbon
>> film</IBA8929><iterationInfo.creator
>> type="double">10</iterationInfo.creator><iterationInfo.note
>> type="string">just
>> classified</iterationInfo.note><IBA8950 type="float">0</IBA8950><IBA8794
>> type="string">default</IBA8794><defaultTraceCode
>> type="string">0</defaultTraceCode><IBA8866
>> type="double">0</IBA8866><IBA8897
>> type="float">0.0</IBA8897><name
>> type="string">TRIMMER</name><versionInfo.identifier.versionSortId
>> type="string">000001</versionInfo.identifier.versionSortId><number
>> type="string">0000000001</number><iterationInfo.state
>> type="string">ctrld</iterationInfo.state><IBA8925
>> type="float">0</IBA8925><IBA8754
>> type="float">0.0</IBA8754><IBA9097 type="float">0.0</IBA9097><IBA9123
>> type="float">0.0</IBA9123><thePersistInfo.createStamp
>> type="datetime">2008-07-16T08:34:20</thePersistInfo.createStamp><state.s
>> tate
>> type="string">INWORK</state.state><masterReference
>> type="double">8436</masterReference><IBA8839
>> type="float">0.0</IBA8839><thePersistInfo.updateStamp
>> type="datetime">2008-08-01T10:44:35</thePersistInfo.updateStamp><IBA8776
>> type="string">True</IBA8776><view type="double">2931</view><IBA8653
>> type="float">0.0</IBA8653><IBA8802
>> type="float">0.0</IBA8802><organizationReference
>> type="double">856</organizationReference><partType
>> type="string">separable</partType><IBA8786
>> type="float">0.0</IBA8786><checkoutInfo.state
>> type="string">c/i</checkoutInfo.state><thePersistInfo.markForDelete
>> type="double">0</thePersistInfo.markForDelete><IBA8733
>> type="string">0</IBA8733><state.atGate
>> type="string">false</state.atGate><iterationInfo.latest
>> type="string">true</iterationInfo.latest><IBA8800
>> type="double">0</IBA8800><Part
>> type="string">IBRTYPE|Part~~WCP|22037|1</Part><IBA9062
>> type="float">0.0</IBA9062><thePersistInfo.modifyStamp
>> type="datetime">2008-08-01T10:44:35</thePersistInfo.modifyStamp><iterati
>> onInfo.predecessor
>> type="double">8440</iterationInfo.predecessor><IBA9035
>> type="string">Industry</IBA9035><IBA8806 type="float">0</IBA8806><source
>> type="string">make</source><state.lifeCycleId
>> type="double">6193</state.lifeCycleId><IBA8827
>> type="string">Through.Hole</IBA8827><iterationInfo.modifier
>> type="double">10</iterationInfo.modifier><IBA8798
>> type="double">0</IBA8798><IBA8825
>> type="string">1</IBA8825><IBA8992 type="string">Open
>> type</IBA8992><IBA8949
>> type="string">Yes</IBA8949><versionInfo.identifier.versionLevel
>> type="double">1</versionInfo.identifier.versionLevel><defaultUnit
>> type="string">ea</defaultUnit><containerReference
>> type="double">8212</containerReference><endItem
>> type="string">false</endItem><IBA8796
>> type="string">1</IBA8796><folderingInfo.cabinet
>> type="double">8254</folderingInfo.cabinet><domainRef
>> type="double">8216</domainRef><IBA9060
>> type="float">0.0</IBA9060><iterationInfo.identifier.iterationId
>> type="string">2</iterationInfo.identifier.iterationId><IBA8080
>> type="string">sizeCode</IBA8080><IBA8891
>> type="float">0.0</IBA8891><IBA8804
>> type="float">0</IBA8804><IBA9005
>> type="float">0.0</IBA9005><checkoutInfo.derivedFrom
>> type="double">8440</checkoutInfo.derivedFrom></Attributes>
>>
>> and I am interested in finding a document by giving a scope like -
>> find a document with IBA8949=Yes AND IBA8802=0.0 and
>> iterationInfo.predecessor=OR.
>>
>> It looks like for each element from this XML, I need to create a field
>> for the index document and create a search query accordingly.
>>
>>
>>
>>
>> -----Original Message-----
>> From: Aleksander M. Stensby [mailto:aleksander.stensby@integrasco.no]
>> Sent: Monday, November 17, 2008 1:55 PM
>> To: java-user@lucene.apache.org
>> Subject: Re: Scoped Search and Facets generation using Lucene
>>
>> I think that the closest you get to "scoped" search in your case would
>> be to use filters. (If you index your paths, or if the documents have
>> some standarized format, I assume you could just use one field per
>> element in your document.)
>>
>> Maybe you could say a bit about you document structure? (If your
>> question is a XML specific parsing question, the Lucene mailing list
>> isn't really right place...)
>>
>> - Aleksander
>>
>>
>> On Fri, 14 Nov 2008 17:52:59 +0100, Otis Gospodnetic
>> <otis_gospodnetic@yahoo.com> wrote:
>>
>>  Hi Mayur,
>>>
>>> Solr has built-in support for facets.  I don't understand what you
>>> mean by scoped searches.  Could you please give a concrete example?
>>>
>>>
>>> Otis
>>> --
>>> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>>>
>>>
>>>
>>>
>>> ________________________________
>>> From: "Bapat, Mayur" <mbapat@ptc.com>
>>> To: java-user@lucene.apache.org
>>> Sent: Friday, November 14, 2008 3:04:12 AM
>>> Subject: Scoped Search and Facets generation using Lucene
>>>
>>> Hi,
>>>
>>> Does Lucene support Scoped Searches? My intention is to index an XML
>>> String and search for a matching element/attribute value from that XML
>>>
>>
>>  by specifying scope(path).
>>> Also is there any direct support for Facets building in Lucene?
>>>
>>> Regards,
>>> Mayur
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>
>>
>>
>> --
>> Aleksander M. Stensby
>> Senior software developer
>> Integrasco A/S
>> www.integrasco.no
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>>
>
>
> --
> Aleksander M. Stensby
> Senior software developer
> Integrasco A/S
> www.integrasco.no
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message