Thoughts about yum repo server and defined repositories

The internal server used as a yum repository runs EL6, and it servers EL6 and EL7 (read: CentOS) yum repos. An admin tried installing mkisofs, which wanted to come by default from the c7-base repository. It also wanted to upgrade bash and glibc. Well, installing a post-usrmerge bash (CentOS 7+) on CentOS6 caused all sorts of havoc. I had to load a rescue iso, boot, and copy /usr/bin/bash and /usr/bin/sh to the correct locations. Then my system would actually boot again.

I was getting an interesting error:

init: Failed to spawn rcS pre-start process: unable to execute: No such file or directory
init: Failed to spawn rcS post-stop process: unable to execute: No such file or directory

Also, kernel options rghb and quiet are really annoying and I always disable them.

So, the moral of the story is: always be very careful running yum on a yum repo server. Double-check what repos your package will pull in.

A case study in wholistic computer system support

A case study in wholistic computer system support

A user emailed me directly asking for me to look at why his webapp isn’t starting. He was having trouble even looking at the logs with journalctl.

I had built the servers for the user, so he always tries to shortcut the support process of opening a ticket to the team queue, and just assigned it directly to me. I emailed the user back saying, “I’ve assigned it to the team queue for you.”

Well, within 5 minutes the level 1 support guy reached out to me saying he couldn’t find anything wrong. And yes, while nothing was wrong on the OS and filesystem level, the app still wasn’t starting.

The user, in the mean time, had remembered his service account that runs the app so he switched user to that and read the logs. It didn’t have permission to the parent directories, on the nfs mount point.

I set aside the tasks I had scheduled for this afternoon, and took a look. Aside from some setgid on the parent directory without g+rx, nothing looked amiss. The directory was owned by root.root which didn’t seem different from the other contents on the filesystem.

My manager approved just fixing the problem and we’ll sort out what caused it later. So the user and I ran chown -R appuser.appuser /mnt/appdir/subdir1 and then the application could get started again.

The investigation

I checked the sudoers history on the two application servers to see if any users ran sudo chown or sudo chmod even though it was locked down enough to prevent that. No activity with sudo looked related, so I then turned to the filesystem itself.

It is nfs, so it could be affected by other systems. After using mount | grep /mnt/appdir and showmount -e $SERVERNAME | grep $EXPORTNAME I had a list of the IP addresses that have access to this export from the nfs server.

I did a reverse dns lookup to get the hostnames because obviously those are easier to recognize than just a whole series of IP addresses. Immediately I recognized the systems as the other container proof of concept.

Before I even started checking bash history on those hosts, I reached out to the container research guy and he told that I think a reckless chown affected an nfs mount point that affected the app running on the original servers. He said yes, it was him, and he “was trying to set the permissions on the logs directory, but the container mounted the entire nfs.”

I thanked him for being honest, and reminded him that chown -R is serious business. He claimed that this is a further reason for having his own nfs mount point. Additionally, kubernetes has a known security flaw about the availability of an entire mount when mounting just a path of it to a container.

This second user made an additional rookie mistake last week when he changed network settings on a system over a network connection before confirming he had console access. Of course, it went offline and he had to get assistance fixing it.


Be very, very careful when you execute a chown -R! And you probably shouldn’t be doing that inside a container with any bind mounts…