lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alessandro Benedetti <benedetti.ale...@gmail.com>
Subject Re: Possible or not ?
Date Fri, 05 Jun 2015 15:21:28 GMT
I can not see any problem in that, but talking about commits I would like
to make a difference between "Hard" and "Soft" .

Hard commit -> durability
Soft commit -> visibility

I suggest you this interesting reading :
https://lucidworks.com/blog/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/
It's an old interesting Erick post.

It explains you better what are the differences between different commit
types.

I would put you in this scenario :

Heavy (bulk) indexing
>
> The assumption here is that you’re interested in getting lots of data to
> the index as quickly as possible for search sometime in the future. I’m
> thinking original loads of a data source etc.
>
>    - Set your soft commit interval quite long. As in 10 minutes or even
>    longer (-1 for no soft commits at all). *Soft commit is about
>    visibility, *and my assumption here is that bulk indexing isn’t about
>    near real time searching so don’t do the extra work of opening any kind of
>    searcher.
>    - Set your hard commit intervals to 15 seconds, openSearcher=false.
>    Again the assumption is that you’re going to be just blasting data at Solr.
>    The worst case here is that you restart your system and have to replay 15
>    seconds or so of data from your tlog. If your system is bouncing up and
>    down more often than that, fix the reason for that first.
>    - Only after you’ve tried the simple things should you consider
>    refinements, they’re usually only required in unusual circumstances. But
>    they include:
>       - Turning off the tlog completely for the bulk-load operation
>       - Indexing offline with some kind of map-reduce process
>       - Only having a leader per shard, no replicas for the load, then
>       turning on replicas later and letting them do old-style replication to
>       catch up. Note that this is automatic, if the node discovers it is “too
>       far” out of sync with the leader, it initiates an old-style replication.
>       After it has caught up, it’ll get documents as they’re indexed to the
>       leader and keep its own tlog.
>       - etc.
>
>
Actually you could do the commit only at the end, but I can not see any
advantage in that.
I suggest you to play with auto hard/soft commit config and get a better
idea of the situation !

Cheers

2015-06-05 16:08 GMT+01:00 Bruno Mannina <bmannina@free.fr>:

> Hi Alessandro,
>
> I'm actually on my dev' computer, so I would like to post 1 000 000 xml
> file (with a structure defined in my schema.xml)
>
> I have already import 1 000 000 xml files by using
> bin/post -c mydb /DATA0/1 /DATA0/2 /DATA0/3 /DATA0/4 /DATA0/5
> where /DATA0/X contains 20 000 xml files (I do it 20 times by just
> changing X from 1 to 50)
>
> I would like to do now
> bin/post -c mydb /DATA1
>
> I would like to know If my SOLR5 will run fine and no provide an memory
> error because there are too many files
> in one post without doing a commit?
>
> The commit will be done at the end of 1 000 000.
>
> Is it ok ?
>
>
>
> Le 05/06/2015 16:59, Alessandro Benedetti a écrit :
>
>> Hi Bruno,
>> I can not see what is your challenge.
>> Of course you can index your data in the flavour you want and do a commit
>> whenever you want…
>> Are those xml Solr xml ?
>> If not you would need to use the DIH, the extract update handler or any
>> custom Indexer application.
>> Maybe I missed your point…
>> Give me more details please !
>>
>> Cheers
>>
>> 2015-06-05 15:41 GMT+01:00 Bruno Mannina <bmannina@free.fr>:
>>
>>  Dear Solr Users,
>>>
>>> I would like to post  1 000 000 records (1 records = 1 files) in one
>>> shoot
>>> ?
>>> and do the commit and the end.
>>>
>>> Is it possible to do that ?
>>>
>>> I've several directories with each 20 000 files inside.
>>> I would like to do:
>>> bin/post -c mydb /DATA
>>>
>>> under DATA I have
>>> /DATA/1/*.xml (20 000 files)
>>> /DATA/2/*.xml (20 000 files)
>>> /DATA/3/*.xml (20 000 files)
>>> ....
>>> /DATA/50/*.xml (20 000 files)
>>>
>>> Actually, I post 5 directories in one time (it takes around 1h30 for 100
>>> 000 records/files)
>>>
>>> But it's Friday and I would like to run it during the W.E. alone.
>>>
>>> Thanks for your comment,
>>>
>>> Bruno
>>>
>>> ---
>>> Ce courrier électronique ne contient aucun virus ou logiciel malveillant
>>> parce que la protection avast! Antivirus est active.
>>> https://www.avast.com/antivirus
>>>
>>>
>>>
>>
>
> ---
> Ce courrier électronique ne contient aucun virus ou logiciel malveillant
> parce que la protection avast! Antivirus est active.
> https://www.avast.com/antivirus
>
>


-- 
--------------------------

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message