Brought to you by the “I hate Mondays” department of Tales Of A Tech.
We would like to remind you that if there is a problem and you can’t seem to figure it out there’s always one solution to look to: It’s ALWAYS DNS.
Last night I was trying to login to this site in order to make a post about my drone shenanigans (which, by the way, is here). I was already super frustrated because Mass Effect: Andromeda is coded by a team of (apparently) monkeys who don’t understand how to make working autosave features (my autosave files are all corrupt, causing me to lose 5+ hours of gameplay) and the site wasn’t being cooperative. I was reaching “blow-your-top” levels of frustration. I was finally able to connect and login but I immediately noticed things were running VERY slowly. It was weird, and I couldn’t figure out why. EVERYTHING seemed slow. Even the SSH connection was jittery and laggy.
At first I thought it was a problem with the network in the apartment. It happens from time to time. We run a lot of equipment in our apartment. I ran some speed tests and some other diagnostics and found that everything seemed to be working. Very weird.
Since I managed to SSH in I decided to check some logs. There didn’t seem to be anything in the apache log. fail2ban wasn’t being uncooperative. MySQL was running and seemingly accepting localhost connections. Very weird. I resorted to the only thing I could think of next: I rebooted the box.
Everything came back up (yay) but the site was still clunky and slow (not so yay).
I decided to enable verbose logging in apache and attempt to restart the service. It timed out. The hell?
I ran “journalctl -xe” per service failure message and noticed a peculiar entry:
Mar 27 21:17:51 TalesOfAnAdmin systemd: apache2.service: Start operation timed out. Terminating. Mar 27 21:17:51 TalesOfAnAdmin systemd: Failed to start The Apache HTTP Server.
Failed to start due to timeout? Timeout of what? SO CONFUSED.
Digging back a little further I saw another confusing line:
Mar 27 21:15:12 TalesOfAnAdmin apachectl: [Mon Mar 27 21:15:12.835322 2017] [core:error] [pid 1864] (EAI 2)Name or service not known: AH00547: Could not resolve host name maplerangers.com
I had a quick confab with Nick and he suggested I check if my websites were in my hostfile. They weren’t, so I added them. Apache startup became instantaneous. Things seemed to be improving. Cool.
I logged in to the website which was up and more responsive, but absolutely slow. Very weird still. I noticed an error on my WordPress Administration page: “curl error 28: operation timed out”
What the hell?
So back to the SSH I go. I try pinging wordpress.com: Unknown host. I try pinging google.com: Unknown host. I try pinging an IP Address for those sites: Working fine.
I look at /etc/resolv.conf — I’m using Google DNS. Ok.
Let’s see if DigitalOcean has anything about this: Bingo!
Our engineering team is investigating reports of DNS connectivity issues while using the Google resolvers. During… https://t.co/cEpXMPJ3Nq
— DigitalOcean Status (@DOStatus) March 27, 2017
I edit the resolv.conf (and interfaces) to reflect a change to OpenDNS as the primary DNS with Google as the fallback DNS (as per the article linked later in the Tweet).
Boom, everything is working.
It was DNS.
It’s always DNS.