Dial Access Fully Restored

     We found servers in our logs that were not in our server configuration.

     Further investigation determined that GlobalPOPs had added additional radius servers but had not informed us.  Requests coming from new servers not in our configuration were rejected causing authentication failures.

     I obtained a complete list of their radius servers, added it to our clients.conf file and now authentication is functioning again.

 

Dial-up Access Down

     Presently, our dial-up customers are unable to authenticate.

     We have tested our radius authentication servers and those of our wholesale provider and found the issue to be with the providers authentication servers.

     I attempted to generate a ticket via the normal means at GlobalPops loginto.us website and found the ticket link broken on their website as well.

     Since they normally do not allow phone tech support and have no means of contacting other than the now broken ticket system, I called their retail dial-up center and got them to generate a ticket for me.  I was assured I would receive a copy in my e-mail and someone would contact me but so far neither has occurred.

 

Maintenance Friday – Saturday

     Late Friday evening into early Saturday there will be some downtime for system maintenance.  First probably around 10:30, I will be taking the server that hosts home directories, out of service for up to an hour.  The reason for this is I will be adding a pair of 10TB drives to allow us to increase the size of the /home directory partition.

     Next I will be moving home directories to this new partition.  While this process will take some time, only during the final aspect when I am ready to switch partitions, will additional downtime be required.  At that point I will need to reboot the machine.

     Then I will need to reboot a number of other machines in order to make a kernel upgrade active.

Linux Security Exploit (Iglulik was attacked)

Warning!!!! This security exploit has not been widely published but it IS actively being exploited. Someone caused my server that houses our customers /home directories to spontaneously reboot trying to exploit it. Fortunately the kernel logged their attempts. See: https://www.kernel.org/doc/html/latest/admin-guide/l1tf.html In our cause I performed measurements on system load, web page loading times and latency with and without this CPU feature turned off and in our case it made no measurable difference so I turned it off with: echo ‘off’ > /sys/devices/system/cpu/smt/control. I put this in /etc/rc.local which is enabled on our machines for this and some other adjustments.

Iglulik Crash

     Iglulik, our system which houses /home directories and some virtual machines spontaneously rebooted.

     Examination of the logs suggest someone tried to take advantage of a CPU exploit similar to Meltdown.  I am looking into several possible mitigation strategies, all of them have performance impacts.

Debian

     Debian’s key database has gotten badly corrupted.  The machine is down at present and is being restored from backups.  It should be available again in about an hour.

     In the meantime, please use one of the other debian derived machines such as julinux.yellow-snow.net, mint.eskimo.com, mxlinux.eskimo.com, ubuntu.eskimo.com, or zorin.eskimo.com.

Centos 7

     CentOS 7 stopped accepting connections today.

     I found ypbind was not bound so it could not access the NIS database to get user authentication information.

     However, ypbind would not rebind on restart.  Usually the cause of this is portmapper (rpcbind) not running.  I checked, it was but I restarted it anyway, still ypbind would not bind.

     At this point I attempted to reboot the machine, systemd got stuck during shutdown and would not complete the shutdown process so I was forced to hard boot it.

     There were two newer kernels than the one it was running on.  It is now running on the newest and seems to be operating normally again.