As you may know, you can connect Moss to your fresh Ubuntu 18.04 or 16.04 server – regardless the provider where such server is hosted. Moss also features native integrations with some cloud providers (Amazon, DigitalOcean, Google and Vultr as of this writing), but you can use Moss with any vps, cloud instance, or even physical server – not a common use case, but feasible anyway.
A few days ago a customer was having an issue when trying to connect an Ubuntu 18.04 instance (hosted on his provider of choice) to Moss. So I decided to create an account on such provider and investigate the problem. It turned out that the provider’s image had some “suboptimal” configurations and that the default solution for name resolution in Ubuntu 18.04 (bionic) has some related bugs. I think the problem is interesting enough to be shared, and it’ll also allow us to talk about systemd and, more specifically, systemd-resolved for name resolution.
systemd is a free software project that aims to provide user-space building blocks for a Linux system. Their most well-known component is an init system able to start multiple services in parallel, therefore reducing the boot time for your Linux box. If you have servers running Ubuntu (xenial or bionic), you’re already using systemd as your init – PID 1 – process.
Despite the former may sound great, systemd has been heavily criticized by part of the open source community for years. I won’t delve into those complaints, but I’ll note one of them: that systemd reinvents lots of sub-systems that didn’t need a fix, leading to subtle incompatibilities with existing infrastructure.
One of those sub-systems is name resolution, i.e. how your server translates domain names into network addresses. systemd comes with its own implementation: systemd-resolved. Ubuntu included systemd-resolved in version 16.10 and it’s now present in the current LTS version – 18.04.
systemd-resolved provides local applications with an interface to the DNS. In addition to implementing a resolver, it adds several capabilities like DNS caching and DNSSEC validation. systemd-resolved can be consumed by applications in three ways:
- By means of its D-Bus API. D-Bus is a message bus for inter-process communication. It’s part of the freedesktop.org project, and systemd makes heavy use of it. So in general, desktop applications and systemd services are the most likely clients of this interface.
- By means of its implementation of the glibc API – getaddrinfo(3) and related functions. Not all systemd-resolved capabilities are currently supported through this interface, and further configuration is required to make systemd-resolved handle name resolution in this case. If so configured, I’d say that most software running on your server would use this interface with systemd-resolved.
- By means of the local DNS stub listener that systemd-resolved runs on IP address 127.0.0.53 on the loopback interface. If an application deals with DNS requests directly, this is the only way that systemd-resolved has to resolve the request.
Hmm, this starts to be complex… doesn’t it? Within a same server, how name resolution actually behaves may differ on a per-application and per-configuration basis. And things can become trickier.
Let’s quickly see how name resolution has been traditionally done in Linux systems.
/etc/resolv.conf and friends
If you’ve ever done some system configuration in Linux, you’ll know that the name resolver is configured in /etc/resolv.conf. It usually consists of a list of nameservers which are queried in order. That’s it. In the old days the sysadmin would set up this file and move on.
But then, more dynamic environments became “the new normal”. In particular, Linux desktops and cloud computing required dynamic networking environments, so a static configuration file for name resolution wasn’t a good fit in such cases. Therefore, applications like resolvconf were (and still are) used to dynamically update /etc/resolv.conf based on external information. resolvconf is not intended to be used by hand, but from other configuration software like ifup, ifdown, dhclient, or dnsmasq.
How does this relate to systemd-resolved? Well, we have more complexity here. systemd-resolved might either be the provider of /etc/resolv.conf or consume that file. It depends on the compatibility mode that has been determined by the system administrator. Basically, you either rely on 127.0.0.53 for name resolution, use systemd-resolved to allow applications to bypass systemd-resolved, or let other packages manage /etc/resolv.conf.
Ok, you must be confused at this moment… Let me explain the problem that originated this blog post and I’ll use it as an example to walk through these configs.
The name resolution issue
The problem that our customer was having on his cloud provider’s Ubuntu 18.04 server was this one:
[email protected]:~# gpg --recv-keys --keyserver hkp://keyserver.ubuntu.com:80 14AA40EC0831756756D7F66C4F4EA0AAE5267A6C gpg: keyserver receive failed: Invalid argument
gpg relies on
dirmngr (both part of the GNU Privacy Guard project) to handle certificates and revocation lists. When I looked for the appropriate error logs, I found a name resolution issue for hostname keyserver.ubuntu.com:
[email protected]:~# cat /var/log/syslog | grep dirmngr Jun 5 09:13:44 ubuntu-1804-image dirmngr: resolving 'keyserver.ubuntu.com' failed: Invalid argument Jun 5 09:13:44 ubuntu-1804-image dirmngr: can't connect to 'keyserver.ubuntu.com': host not found [redacted]
Ok, so the host wasn’t being found. Let’s try to resolve it:
[email protected]:~# host keyserver.ubuntu.com keyserver.ubuntu.com has address 18.104.22.168 keyserver.ubuntu.com has address 22.214.171.124
Hmm… it works – something strange is happening. What nameservers are being used?
[email protected]:~# cat /etc/resolv.conf [redacted] nameserver 127.0.0.53 nameserver 126.96.36.199 nameserver 188.8.131.52
- 127.0.0.53: systemd-resolved’s stub listener.
- 184.108.40.206 and 220.127.116.11: Clouflare’s resolver.
Since 127.0.0.53 is the first name server in /etc/resolv.conf, that should be the one handling DNS requests in first place. How is systemd-resolved configured?
[email protected]:~# cat /etc/systemd/resolved.conf # This file is part of systemd. # # systemd is free software; you can redistribute it and/or modify it # under the terms of the GNU Lesser General Public License as published by # the Free Software Foundation; either version 2.1 of the License, or # (at your option) any later version. # # Entries in this file show the compile time defaults. # You can change settings by editing this file. # Defaults can be restored by simply deleting this file. # # See resolved.conf(5) for details [Resolve] #DNS= #FallbackDNS= #Domains= #LLMNR=no #MulticastDNS=no #DNSSEC=no #Cache=yes #DNSStubListener=yes
Apparently this is ok – as per the last line, the stub resolver (127.0.0.53) is enabled and should be answering name resolution queries. Let’s check if it’s actually running:
[email protected]:~# systemctl status systemd-resolved.service ● systemd-resolved.service - Network Name Resolution Loaded: loaded (/lib/systemd/system/systemd-resolved.service; enabled; vendor preset: enabled) Active: active (running) since Tue 2018-06-05 08:04:10 UTC; 12min ago [redacted]
[email protected]:~# netstat -nlutp Active Internet connections (only servers) Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name tcp 0 0 127.0.0.53:53 0.0.0.0:* LISTEN 1753/systemd-resolv udp 18432 0 127.0.0.53:53 0.0.0.0:* 1753/systemd-resolv [redacted]
Yes, it’s running and the stub is listening on the appropriate ports – udp/53 and tcp/53. But wait, we saw before that applications can also interface with systemd-resolved by means of D-Bus or glibc APIs. If
host used different approaches, we might infer that one of them is exposing an issue but the other one isn’t.
dirmngr uses glibc’s getaddrinfo() but
host is part of BIND and it deals with DNS requests directly. I checked this by disassembling the code with
objdump (binutils package), but I could have reviewed the source code instead. Are the
dirmngr‘s calls being handled by systemd-resolved directly? To answer this, we have to check whether the hosts: directive in /etc/nsswitch.conf contains the keyword “resolve”. However, we can see that’s not the case in the server under study.
[email protected]:~# cat /etc/nsswitch.conf # /etc/nsswitch.conf # # Example configuration of GNU Name Service Switch functionality. # If you have the `glibc-doc-reference' and `info' packages installed, try: # `info libc "Name Service Switch"' for information about this file. passwd: compat systemd group: compat systemd shadow: compat gshadow: files hosts: files dns networks: files protocols: db files services: db files ethers: db files rpc: db files netgroup: nis
So apparently we can assume that systemd-resolved handles
dirmngr‘s requests as they reach 127.0.0.53. In case the latter times out, Cloudflare servers will be queried instead.
Then, why the
gpg command fails to resolve the hostname? What’s really happening under the hoods? Network traffic will tell us. Let’s capture DNS requests and responses with
tcpdump while we run
We can observe different phases in the former screenshot.
- The application looks for the IPv4 addresses (Type A records) of keyserver.ubuntu.com. The query reaches systemd-resolved’s stub listener and it issues three queries in parallel – one for Cloudflare and two for Google DNS servers. The first response is returned to the application.
- The application looks for the IPv6 addresses (Type AAAA records) of keyserver.ubuntu.com. Same behavior as before.
- The application looks for a Type 0 (Class 7168) record and systemd-resolved’s stub listener replies with a Format Error.
- Queries time out 5 seconds later and the process starts over.
The record in phase 3 turns out to be RRSIG – it holds digital signatures of resource records which are used during the DNSSEC authentication process. At this moment, systemd-resolved doesn’t support queries for these (and related) records under certain conditions. We can easily check this by forcing 127.0.0.53 to resolve an RRSIG query (it fails). If the same query is served by Cloudflare’s DNS nameserver, it succeeds:
[email protected]:~# host -t RRSIG keyserver.ubuntu.com 127.0.0.53 Using domain server: Name: 127.0.0.53 Address: 127.0.0.53#53 Aliases: Host keyserver.ubuntu.com not found: 1(FORMERR) [email protected]:~# host -t RRSIG keyserver.ubuntu.com 18.104.22.168 Using domain server: Name: 22.214.171.124 Address: 126.96.36.199#53 Aliases: keyserver.ubuntu.com has no RRSIG record
Finally we have something that explains why
gpg failed. However, it’s still not clear why all queries time out, since A and AAAA records were successfully answered. Let’s keep looking into that.
Why is systemd-resolved issuing 3 queries in parallel? Let’s check its status:
[email protected]:~# systemd-resolve --status Global DNS Servers: 188.8.131.52 184.108.40.206 DNSSEC NTA: 10.in-addr.arpa 16.172.in-addr.arpa 168.192.in-addr.arpa 17.172.in-addr.arpa 18.172.in-addr.arpa 19.172.in-addr.arpa 20.172.in-addr.arpa 21.172.in-addr.arpa 22.172.in-addr.arpa 23.172.in-addr.arpa 24.172.in-addr.arpa 25.172.in-addr.arpa 26.172.in-addr.arpa 27.172.in-addr.arpa 28.172.in-addr.arpa 29.172.in-addr.arpa 30.172.in-addr.arpa 31.172.in-addr.arpa corp d.f.ip6.arpa home internal intranet lan local private test Link 3 (eth1) Current Scopes: DNS LLMNR setting: yes MulticastDNS setting: no DNSSEC setting: no DNSSEC supported: no DNS Servers: 220.127.116.11 18.104.22.168 Link 2 (eth0) Current Scopes: DNS LLMNR setting: yes MulticastDNS setting: no DNSSEC setting: no DNSSEC supported: no DNS Servers: 22.214.171.124 126.96.36.199
- 188.8.131.52 and 184.108.40.206: Cloudflare’s name servers as the global DNS servers. These come from /etc/resolv.conf as we saw before.
- 220.127.116.11 and 18.104.22.168: Google’s name servers for queries flowing through interface eth0. These come from external sources, in particular a DHCP server.
- 22.214.171.124 and 126.96.36.199: Same as above but for interface eth1.
The three parallel queries match the expected behavior of systemd-resolved: one for the global name server and another one per network interface. Who’s setting up Cloudflare as the global name server? Let’s revisit /etc/resolv.conf:
[email protected]:~# ls -l /etc/resolv.conf lrwxrwxrwx 1 root root 29 May 7 12:36 /etc/resolv.conf -> ../run/resolvconf/resolv.conf [email protected]:~# cat /etc/resolv.conf # Dynamic resolv.conf(5) file for glibc resolver(3) generated by resolvconf(8) # DO NOT EDIT THIS FILE BY HAND -- YOUR CHANGES WILL BE OVERWRITTEN # 127.0.0.53 is the systemd-resolved stub resolver. # run "systemd-resolve --status" to see details about the actual nameservers. nameserver 127.0.0.53 nameserver 188.8.131.52 nameserver 184.108.40.206
What a strange config! Don’t you think? Note the conflicting settings:
- You’re using systemd-resolved and it takes per-interface DNS configurations from a DHCP server. Such DHCP server lists external DNS servers – from Google.
- But /etc/resolv.conf is being managed by
resolvconf. This should mean that the sysadmin wants to use the last compatibility mode of systemd-resolved regarding /etc/resolv.conf . However, since 127.0.0.53 is listed as a name server, systemd-resolved takes control over name resolution.
resolvconfincludes additional (but different) external DNS servers – from Cloudflare – for your global name resolution settings.
This configuration nightmare along with the RRSIG bug we discussed earlier make your system break in a very subtle, hard-to-debug, hard-to-understand way. In particular, when some requests are awaiting an answer from some of the upstream DNS servers, but systemd-resolved receives a query it doesn’t support (like the one searching for a RRSIG record), the name resolution process seems to fail entirely.
In my opinion, the settings that the cloud provider chose are flawed (from a maintainability viewpoint) and must be fixed. Even if they worked, they make little sense and add complexity to an already complex setup. The provider should choose a clear policy – either use 127.0.0.53 as the only name server or get rid of systemd-resolved – and stick with it.
DNS is a critical component of the Internet, and it’s more complex as it might seem at first sight. For the sake of additional functionality, systemd-resolved adds more complexity to it. According to the author of systemd-resolved:
resolved is not supposed to be a DNS server, it’s supposed to be exactly good enough so that libc-like DNS clients can resolve their stuff
Therefore, it seems s a bit unfortunate that distros like Ubuntu Server (among others) and upstream providers include it as the default solution for name resolution without careful settings.
In this article I’ve tried to walk you through the pain of tracking down a real issue on a real provider. If you think I got something wrong, just drop me a message and I’ll be happy to update this post. Hey, and don’t forget to sign up below if you want us to send you an email as we publish more stuff 😀