lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Li Li <fancye...@gmail.com>
Subject Re: looking for a BooleanMatcher instead of BooleanScorer
Date Fri, 01 Jun 2012 09:15:50 GMT
sorry, the first problem is not mine.

On Fri, Jun 1, 2012 at 4:58 PM, Tanguy Moal <tanguy.moal@gmail.com> wrote:
> Hello,
>
> I'm just sharing my thoughts, they might be off-topic...
>
> Take the first example quoted from github : the user wants to find all nodes
> having their facebookId in a given quite long list ( a friends list, be
> aware that some facebook users have 1500+ friends!).
>
> The application firstly had the facebookId for a user (say id=someId), and
> requested the facebook graph with that id and got a quite long list of
> facebookIds back, right?
> At that time, I think the application should not try to enumerate its neo4j
> graph using a OR-ed facebookIds list.
> It should make sure that each neo4j node in set of the friends list has a
> "friendOf" attribute and ensure that this multivalued attribute contains the
> facebookId : someId for each involved node. Trigger an update request of
> those updated nodes.
> You could make your application wait for that update to complete if it
> really needs to be synchronous with facebook.
> That moves the problem to handling update request smartly which might be
> easier sometimes.
> Here you will eventually want to store a hash the user's friendslist
> somewhere in the user's node so you know in advance if that user's friends
> list has changed and if you need to trigger the update process again (just
> thinking).
> When your user uses the application for the first time, or every time after
> she updated her friends list, an update job will be fired for that user. You
> may want to wait for update request to complete only the first time (if you
> don't need your app to be 100% synchronized with facebook), and make the
> subsequent jobs be queued to something handling these updates
> efficiently.  That could stress the storage system with intensive writes
> from times to times, especially at the beginning but that will converge to a
> mainly read-based application after most active user has used the
> application once. New friendships aren't that frequent (IMHO).
> May by NRT developments could be used in this scenario... I don't know much
> more. I don't know anything about how Neo4J works, I used it once, that's
> all.
> Anyway if you hit writes issues, congratulations your application is being
> used widely, go buy SSD disks :)
>
> Finally, you will then enumerate your nodes with a very quick and efficient
> query friendOf:"someId" .
>
>
> What I wanted to mean is that if your application really needs to perform
> queries made of many, many, many, ... really many terms that are OR-ed, then
> there might exist (but it's not always true) a different design of your data
> model that could allow you to still fit the use case of a search engine.

I agree. Lucene/solr may need support many other types of query used
in traditional database.
for now, we usually store structured data in rdbms and full text in
lucene/solr. But the
synchronization of data is a nightmare.  we like just use one full
featured solution instead of
integrating many solutions.


>
> This applies to 1 and may be to 2 too. ( :p 2-2-2 -- never mind )
>
> I don't really understand for 3 which seems to be a MinShouldMatch issue.
>
> As I said in the beginning, I'm simply sharing my thoughts! I hope this
> helps...
>
> --
> Tanguy
>
> 2012/6/1 Li Li <fancyerii@gmail.com>
>>
>> hi all,
>>    I am looking for a 'BooleanMatcher' in lucene. for many
>> application, we don't need order matched documents by relevant scores.
>> we just like the boolean query. But the BooleanScorer/BooleanScorer2
>> is a little bit heavy for the purpose of relevant scoring.
>>    one use case is: we have some fields which has very small number
>> of tokens(usually only one word). such as id,tag or something else.
>>    But we need query like this: id in (1,3,5.....). if using
>> booleanQuery (id:1 id:3 id:5 ...). BooleanScorer can only apply to 31
>> terms. BooleanScorer2 using priority queue to know how many terms are
>> matched(Coord).
>>    Filters may help but it can be a very complicated query(or else,
>> it self still using BooleanQuery, there is a recursive problem)
>>
>>    we may divide current BooleanScorer to a BooleanMatcher and a
>> Ranker. if we need score the hitted docs, we ask the BooleanScorer for
>> not only hitted id but also tf/idf coord or anything we need to use in
>> ranking. but sometimes we only need docIds. then the BooleanMatcher
>> can optimize it's implementation. for the case of many disjunction
>> terms, we can do it like Filter or BooleanScorer instead of
>> BooleanScorer2.
>>
>>    is it possible?
>>
>>    following is some user demands I searched from the mail list. the
>> first one is my own requirement.
>>
>>    1. https://github.com/neo4j/community/issues/494
>>
>>    2. mail to lucene
>>
>> qibaoyuan@126.com qibaoyuan@126.com via lucene.apache.org
>>
>> May 6
>>
>> to lucene
>> Hi,
>>      I met a problem about how to search many keywords  in about
>> 5,000,000 documents.For example the query may be like "(a1 or a2 or a3
>> ....a200) and (b1 or b2 or b3 or b4 ..... b400)",I found it will take
>> vey long time(40seconds) to get the the answer in only one field(Title
>> field),and JVM will throw OutMemory error in more fields(title field
>> plus content field).Any suggestions or good idea to solve this
>> problem?thanks in advance.
>>
>>
>>   3 mail to lucene
>> Chris Book chrisbook@gmail.com via lucene.apache.org
>>
>> Apr 11
>>
>> to solr-user
>> Hello, I have a solr index running that is working very well as a search.
>>  But I want to add the ability (if possible) to use it to do matching.
>>  The
>> problem is that by default it is only looking for all the input terms to
>> be
>> present, and it doesn't give me any indication as to how many terms in the
>> target field were not specified by the input.
>>
>> For example, if I'm trying to match to the song title "dust in the wind",
>> I'm correctly getting a match if the input query is "dust in wind".  But I
>> don't want to get a match if the input is just "dust".  Although as a
>> search "dust" should return this result, I'm looking for some way to
>> filter
>> this out based on some indication that the input isn't close enough to the
>> output.  Perhaps if I could get information that that the number of input
>> terms is much less than the number of terms in the field.  Or something
>> else along those line?
>>
>> I realize that this isn't the typical use case for a search, but I'm just
>> looking for some suggestions as to how I could improve the above example a
>> bit.
>>
>> Thanks,
>> Chris
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: dev-help@lucene.apache.org
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message