[rancid] Improving Rancid's processing speed when having 1k+ devices

Thu Jul 25 17:16:43 UTC 2019

I would also recommend running multiple rancid servers maybe scatter them geographically so it’s not a single machine pulling all the weight.  Break the work loads up among them.

> On Jul 25, 2019, at 12:55 PM, john heasley <heas at shrubbery.net> wrote:
> 
> Thu, Jul 25, 2019 at 02:29:37PM +0200, Florin Vlad Olariu:
>> Well, as per title, is there any way to improve rancid's speed with so many
>> devices? At the moment I set PAR_COUNT to 300, so it will connect in
>> parallel to 300 devices at a time, but the reality is that most time does
>> not seem to be taken by connecting and retrieving config but by what
>> happens next in the file processing and git-comitting.
>> 
>> To give you some stats, with current settings it takes around 9 minutes to
>> do 1200 devices. I have only 1 group with all devices under the same group.
>> 
>> Any trick you might have, please let me know!
> 
> Typically, the network and, more so, the devices are the slow part.  Some
> devices are much slower than others.  more parallelism helps a lot - your
> high PAR_COUNT.  other thoughts:
> 
> - cvs is slow.  use svn or git.  svn is probably faster; but I have not
>  benchmarked the two for the functions that rancid uses.
> - make sure that the rancid user is not process rlimited to less than ~605
>  processes; or PAR_COUNT * 2 + 5 or so.
> - perl is a meory pig.  if the host/vm has memory pressure, this would be
>  something to address.
> - retrieving device output does not require much cpu, but process does use
>  some - dont starve it
> - use rancid.conf:NOPIPE=YES; i think this is faster because perl is a pig.
> - if you only need configs, then reduce what is collected to just show version
>  and show running.  or have one hourly group that collects that, and a daily
>  group that collects everything.  less processing, and esp many fewer regexes.
> 
> multiple groups might help, at least for the SCM part.  split your one large
> group into a few.  make sure to use a separate cron for each so that they run
> in parallel.
> 
> I havent attempted to benchmark or optimize any parts for a while.  There was
> a complaint about the start-up time for control_rancid, which seems to me to
> be inconsequential, but I do not know what the users were attempting to do
> with rancid that made this matter.  There are other benefits to this, so I've
> started to re-write it; this is not ready yet.
> 
> 9 minutes for 1200 devices seems reasonable to me. :)
> 
> _______________________________________________
> Rancid-discuss mailing list
> Rancid-discuss at shrubbery.net
> http://www.shrubbery.net/mailman/listinfo/rancid-discuss