[rancid] Improving Rancid's processing speed when having 1k+ devices
heas at shrubbery.net
Thu Jul 25 16:55:31 UTC 2019
Thu, Jul 25, 2019 at 02:29:37PM +0200, Florin Vlad Olariu:
> Well, as per title, is there any way to improve rancid's speed with so many
> devices? At the moment I set PAR_COUNT to 300, so it will connect in
> parallel to 300 devices at a time, but the reality is that most time does
> not seem to be taken by connecting and retrieving config but by what
> happens next in the file processing and git-comitting.
> To give you some stats, with current settings it takes around 9 minutes to
> do 1200 devices. I have only 1 group with all devices under the same group.
> Any trick you might have, please let me know!
Typically, the network and, more so, the devices are the slow part. Some
devices are much slower than others. more parallelism helps a lot - your
high PAR_COUNT. other thoughts:
- cvs is slow. use svn or git. svn is probably faster; but I have not
benchmarked the two for the functions that rancid uses.
- make sure that the rancid user is not process rlimited to less than ~605
processes; or PAR_COUNT * 2 + 5 or so.
- perl is a meory pig. if the host/vm has memory pressure, this would be
something to address.
- retrieving device output does not require much cpu, but process does use
some - dont starve it
- use rancid.conf:NOPIPE=YES; i think this is faster because perl is a pig.
- if you only need configs, then reduce what is collected to just show version
and show running. or have one hourly group that collects that, and a daily
group that collects everything. less processing, and esp many fewer regexes.
multiple groups might help, at least for the SCM part. split your one large
group into a few. make sure to use a separate cron for each so that they run
I havent attempted to benchmark or optimize any parts for a while. There was
a complaint about the start-up time for control_rancid, which seems to me to
be inconsequential, but I do not know what the users were attempting to do
with rancid that made this matter. There are other benefits to this, so I've
started to re-write it; this is not ready yet.
9 minutes for 1200 devices seems reasonable to me. :)
More information about the Rancid-discuss