tuned for Debian

For quite some time I wanted to have tuned in Debian, but somehow never motivated myself to do the packaging. Two weeks ago I then finally decided to pick it up (esp. as mika and a few others were asking about it).

There was an old RFP/ITP 789592, without much progress, so I did the packing from scratch (heavy based on the Fedora package). gustavo (the owner of the ITP) also joined the effort, and shortly after the upstream release of 2.8.0 we had tuned in Debian (with a very short time in NEW, thanks ftp-masters!).

I am quite sure that the package is far from perfect yet, especially as the software is primary built for and tested on Fedora/CentOS/RHEL. So keep the bugs, suggestions and patches comming (thanks mika!).

how to accidentally break DNS for 15 domains or why you maybe could not send mail to me

TL;DR: DNS for golov.de and other (14) domains hosted on my infra was flaky from 15th to 17th of May, which may have resulted in undelivered mail.

Yeah, I know, I haven't blogged for quite some time. Even not after I switched the engine of my blog from WordPress to Nikola. Sorry!

But this post is not about apologizing or at least not for not blogging.

Last Tuesday, mika sent me a direct message on Twitter (around 13:00) that read „problem auf deiner Seite?“ or “problem on your side/page?”. Given side and page are the same word in German, I thought he meant my (this) website, so I quickly fired up a browser, checked that the site loads (I even checked both, HTTP and HTTPS! :-)) and as everything seemed to be fine and I was at a customer I only briefly replied “?”. A couple messages later we found out that mika tried to send a screenshot (from his phone) but that got lost somewhere. A quick protocol change later (yay, Signal!) and I got the screenshot. It said "<evgeni+grml@golov.de>: Host or domain name not found. Name service error for name=golov.de type=AAAA: Host found, but no data record of requested type". Well, yeah, that looks like an useful error message. And here the journey begins.

For historical nonsense golov.de currently does not have any AAAA records, so it looked odd that Postfix tried that. Even odder was that dig MX golov.de and dig mail.golov.de worked just fine from my laptop.

Still, the message looked worrying and I decided to dig deeper. golov.de is served by three nameservers: ns.die-welt.net, ns2.die-welt and ns.inwx.de and dig was showing proper replies from ns2.die-welt.net and ns.inwx.de but not from ns.die-welt.net, which is the master. That was weird, but gave a direction to look at, and explained why my initial tests were OK. Another interesting data-point was that die-welt.net was served just fine from all three nameservers.

Let's quickly SSH into that machine and look what's happening… Yeah, but I only have my work laptop with me, which does not have my root key (and I still did not manage to setup a Yubikey/Nitrokey/whatver). Thankfully my key was allowed to access the hypervisor, yay console!

Now let's really look. golov.de is served from from the bind backend of my PowerDNS, while die-welt.net is served from the MySQL backend. That explains why one domain didn't work while the other did. The relevant zone file looked fine, but the zones.conf was empty. WTF?! That zones.conf is autogenerated by Froxlor and I had upgraded it during the weekend to get Let's Encrypt support. Oh well, seems I hit a bug, damn. A few PHP hacks later and I got my zones.conf generated properly again and all was good.

But what had really happened?

On Saturday (around 17:00) I upgraded to Froxlor 0.9.35.1 to get Let's Encrypt support and hit Froxlor bug 1615 without noticing as PowerDNS re-reads zones.conf only when told.
On Sunday PowerDNS was restarted because of upgraded packages, thus re-reading zones.conf and properly logging:
```
May 15 08:10:59 shokki pdns[2210]: [bindbackend] Parsing 0 domain(s), will report when done
```
On Tuesday the issue hit a friend who cared and notified me
On Tuesday the issue was fixed (first by a quick restore from etckeeper, later by fixing the generating code):
```
May 17 14:56:08 shokki pdns[24422]: [bindbackend] Parsing 15 domain(s), will report when done
```

And the lessons learned?

Monitor all your domains, on all your nameservers. (I didn't)
Have emergency access to all you servers. (I did, but it was complicated)
Use etckeeper, it's easier to use than backups in such cases.
When hitting bugs, look in the bugtracker before solving the issue yourself. (I didn't)
Have friends who care :-)

Debian Bug Squashing Party Salzburg 2014

bsp2014_small

This weekend, Bernd Zeimetz organized a BSP at the offices of conova in Salzburg, Austria. Three days of discussions, bugfixes, sparc removals and a lot of fun and laughter.

We squashed a total of 87 bugs: 66 bugs affecting Jessie/Sid were closed, 9 downgraded and 8 closed via removals. As people tend to care about (old)stable, 3 bugs were fixed in Wheezy and one in Squeeze. These numbers might be not totaly correct, as were kinda creative at counting... Marga promised a talk about "an introduction to properly counting bugs using the 'Haus vom Nikolaus' algorithm to the base of 7".

Speaking of numbers, I touched the following bugs (not all RC):

#741806: pygresql: FTBFS: pgmodule.c:32:22: fatal error: postgres.h: No such file or directory
Uploaded an NMU with a patch. The bug was introduced by the recent PostgreSQL development package reorganisation.
#744229: qpdfview: FTBFS synctex/synctex_parser.c:275:20: fatal error: zlib.h: No such file or directory
Talked to the maintainer, explaining the importance of the upload and verifying his fix.
#744300: pexpect: missing dependency on dh-python
Downgraded to wishlist after verifying the build dependency is only needed when building for Wheezy backports.
#744917: luajit: FTBFS when /sbin is not in $PATH
Uploaded an NMU with a patch, which later was canceled due to a maintainer upload with a slightly different fix.
#742943: nagios-plugins-contrib: check_raid: wants mpt-statusd / mptctl
Analyzed the situation, verified the status with the latest upstream version of the ckeck and commented on the bug.
#732110: nagios-plugins-contrib: check_rbl error when nameserver available only in IPv6
Verify that the bug is fixed in the latest release and mark it as done.
#684726: nagios-plugins-contrib: RFP: check-v46 -- Icinga / Nagios plugin for dual stacked (IPv4 / IPv6) hosts
Mark bug as done, the changelog was missing a proper "Closes" tag.
#661167: nagios-plugins-contrib: please include nagios-check-printer-status
Mark bug as done, the changelog was missing a proper "Closes" tag.
#745895: nagios-plugins-contrib: does not compile against Varnish 4.0.0
Write a patch for supporting the Varnish 3 and 4 APIs at the same time. Also proposed the patch upstream.
#744922: nagios-plugins-contrib: check_packages: check for security updates broken
Forward our security_updates_critical patch and Felix' fixes to it upstream, then updating check_packages to the latest upstream version.
#744248: nagios-plugins-contrib: check_cert_expire: support configurable warn/crit times
Forward Helmut's patch upstream, then updating check_cert_expire to the latest upstream version.
#745691: django-classy-tags: FTBFS: Sphinx documentation not found
Analyze the issue and the fix proposed in SVN, comment on the bug.
#713876: thinkfan: [PATCH] bugfix: use $MAINPID for ExecReload
Prepare and upload of the latest upstream release, which includes Michael's patch.
#713878: thinkfan: [PATCH] use dh-systemd for proper systemd-related maintscripts
Apply Michael's patch to the Debian packaging.
#728087: thinkfan: Document how to start thinkfan with systemd
Apply Michael's patch to the Debian packaging.
#742515: blktap-dkms: blktapblktap kernel module failed to build
Upload an NMU with a patch based on the upstream fix.
#745598: libkolab: FTBFS in dh_python2 (missing Build-Conflicts?)
Upload an NMU with a patch against libkolab's cmake rules, tightening the search for Python to 2.7.
#745599: libkolabxml: FTBFS with undefined reference to symbol '_ZTVN5boost6detail16thread_data_baseE'
Upload an NMU with a patch against libkolabxml's cmake rules, properly linking the tests to the Boost libraries.
#746160: libcolabxml: FTBFS when both python2 and python3 development headers are installed
Filling the bug while working on #745599, then uploading an NMU with a patch against libkolabxml's cmake rules, tightening the search for Python to 2.7.
#714045: blcr-dkms: blcr module is not built on kernel 3.9.1
Checking the status of the bug upstream, and marking it as forwarded.
#653404: hdapsd: init.d status support
Added Peter's patch to the Debian packaging, upload yet pending.
#702199: hdapsd: Typo in package description
Fixed typo in the description, upload yet pending.
#745219: crmsh: should depends on python-yaml
Verifying the bug with Stefan Bauer and sponsoring his NMU.
#741600: 389-ds-base: CVE-2014-0132
Sponsoring the NMU for Tobias Frost.

A couple of (non-free) pictures are available at Uwe's salzburg-cityguide.at.

Thanks again to Bernd for organizing and conova and credativ for sponsoring!

diffing configuration files made easy

Let's assume you are a sysadmin and have to debug a daemon giving bad performance on one machine, but not on the other. Of course, you did not setup either machine, have only basic knowledge of the said daemon and would really love to watch that awesome piece of cinematographic art with a bunch of friends and a couple of beers. So it's like every day, right?

The problem with understanding running setups is that you often have to read configuration files. And when reading one is not enough, you have to compare two or more of them. Suddenly, a wild problem occurs: order and indentation do not matter (unless they do), comments are often just beautiful noise and why the hell did "that guy" smoke/drink/eat while explicitly setting ALL THE OPTIONS to their defaults before actually setting them as he wanted.

If you are using diff(1), you probably love to read a lot of differences, which are none in reality. Want an example?

[foo]
bar = bar
foo = foo

and

# settings for foo
[foo]
# foo is best
foo = foo
# bar is ok here, FIXME?
bar = bar

and

[foo]
foo = x
bar = x

[foo]
foo = foo
bar = bar

are actually the same, at least for some parsers. XTaran suggested using something like wdiff or dwdiff, which often helps, but not in the above case. Others suggested vimdiff, which is nice, but not really helpful here either.

As there is a problem, and I love to solve these, I started a small new project: cfgdiff. It tries to parse two given files and give a diff of the content after normalizing it (merging duplicate keys, sorting keys, ignoring comments and blank lines, you name it). Currently it can parse various INI files, JSON, YAML and XML. That's probably not enough to be the single diff tool for configuration files, but it is quite a nice start. And you can extend it, of course ;)

Monitoring your Puppet nodes using PuppetDB

When you run Puppet, it is very important to monitor whether all nodes have an uptodate catalog and did not miss the last year of changes because of a typo in a manifest or a broken cron-script. The most common solution to this is a script that checks /var/lib/puppet/state/last_run_summary.yaml on each node. While this is nice and easy in a small setup, it can get a bit messy in a bigger environment as you have to do an NRPE call for every node (or integrate the check as a local check into check_mk).

Given a slightly bigger Puppet environment, I guess you already have PuppetDB running. Bonuspoints if you already let it save the reports of the nodes via reports = store,puppetdb. Given a central knowledgebase about your Puppet environment one could ask PuppetDB about the last node runs, right? I did not find any such script on the web, so I wrote my own: check_puppetdb_nodes.

The script requires a "recent" (1.5) PuppetDB and a couple of Perl modules (JSON, LWP, Date::Parse, Nagios::Plugin) installed. When run, the script will contact the PuppetDB via HTTP on localhost:8080 (obviously configurable via -H and -p, HTTPS is available via -s) and ask for a list of nodes from the /nodes endpoint of the API. PuppetDB will answer with a list of all nodes, their catalog timestamps and whether the node is deactivated. Based on this result, check_puppetdb_nodes will check the last catalog run of all not deactivated nodes and issue a WARNING notification if there was none in the last 2 hours (-w) or a CRITICAL notification if there was none for 24 hours (-c).

As a fresh catalog does not mean that the node was able to apply it, check_puppetdb_nodes will also query the /event-counts endpoint for each node and verify that the node did not report any failures in the last run (for this feature to work, you need reports stored in PuppetDB). You can modify the thresholds for the number of failures that trigger a WARNING/CRITICAL with -W and -C, but I think 1 is quite a reasonable default for a CRITICAL in this case.

Using check_puppetdb_nodes you can monitor the health of ALL your Puppet nodes with a singe NRPE call. Or even with zero, if your monitoring host can access PuppetDB directly.

Say hello to Mister Hubert!

Some days ago I got myself a new shiny Samsung 840 Pro 256GB SSD for my laptop. The old 80GB Intel was just too damn small. Instead of just doing a pvmove from the old to the new, I decided to set up the system from scratch. That is an awesome way to get rid of old and unused stuff or at least move it to some lower class storage (read: backup). One of the things I did not bother to copy from the old disk were my ~/Debian, ~/Grml and ~/Devel folders. I mean, hey, it's all in some kind of VCS, right? I can just clone it new, if I really want. Neither I copied much of my dotfiles, these are neatly gitted with the help of RichiH's awesome vcsh and a bit of human brains (no private keys on GitHub, yada yada). After cloning a couple of my personal repos from GitHub to ~/Devel, I realized I was doing a pretty dumb job, a machine could do for me. As I already was using Joey's mr for my vcsh repositories, generating a mr config and letting mr do the actual job was the most natural thing to do. So was using Python Requests and GitHub's JSON API. And here is Mister Hubert, aka mrhub: github.com/evgeni/MisterHubert. Just call it with your GitHub username and you get a nice mr config dumped to stdout. Same applies for organizations.

Authentication for private repos? ✓ (-p)
Other clone mechanisms? ✓ (-c)
A help function? ✓ (-h)
Other features? ✓

As usual, I hope this is useful :)

Running Debian without Unity on a machine that is 64 bit capable!

Sorry Bryan, I can show you plenty of hardware that is perfectly 64 bit capable but probably never will run Ubuntu and/or Unity. First, what is 64 bit for you? Looking at ubuntu.com/download and getting images from there, one gets the impression, that 64 bit is amd64 (also called x86_64). If one digs deeper to cdimage.ubuntu.com, one will find non-Intel images too: PowerPC and amrhf. As the PowerPC images are said to boot on G3 and G4 PowerPCs, these are 32 bit. Armhf is 32 bit too (arm64/aarch64 support in Linux is just evolving). So yes, if 64 bit means amd64, I do have hardware that can run Unity. But you asked if I have hardware that is 64 bit capable and can run Ubuntu/Unity, so may I apply my definiton of 64 bit here? I have an old Sun Netra T1-200 (500MHz UltraSPARC IIe) running Debian's sparc port, which has a 64 bit kernel and 32 bit userland. Unity? No wai. I do not own any ia64 or s390/s390x machines, but I am sure people do. And guess what, no Unity there either :) Sorry for ranting like this, but 64 bit really just means that the CPU can handle 64 bit big addresses etc. End even then, it not always will do so ;)

powerdyn - a dynamic DNS service for PowerDNS users

You may not know this, but I am a huge PowerDNS fan. This may be because it is so simple to use, supports different databases as backends or maybe just because I do not like BIND, pick one. I also happen to live in Germany where ISPs usually do not give static IP-addresses to private customers. Unless you pay extra or limit yourself to a bunch of providers that do good service but rely on old (DSL) technology, limiting you to some 16MBit/s down and 1MBit/s up. Luckily my ISP does not force the IP-address change, but it does happen from time to time (once in a couple of month usually). To access the machine(s) at home while on a non-IPv6-capable connection, I have been using my old (old, old, old) DynDNS.com account and pointing a CNAME from under die-welt.net to it. Some time ago, DynDNS.com started supporting AAAA records in their zones and I was happy: no need to type hostname.ipv6.kerker.die-welt.net to connect via v6 -- just let the application decide. Well, yes, almost. It's just DynDNS.com resets the AAAA record when you update the A record with ddclient and there is currently no IPv6 support in any of the DynDNS.com clients for Linux. So I end up with no AAAA record and am not as happy as I should be. Last Friday I got a mail from DynDNS:

Starting now, if you would like to maintain your free Dyn account, you must now log into your account once a month. Failure to do so will result in expiration and loss of your hostname. Note that using an update client will no longer suffice for this monthly login. You will still continue to get email alerts every 30 days if your email address is current. Yes, thank you very much...

Given that I have enough nameservers under my control and love hacking, I started writing an own dynamic DNS service. Actually you cannot call it a service. Or dynamic. But it's my own, and it does DNS: powerdyn. It is actually just a script, that can update DNS records in SQL (from which PowerDNS serves the zones). When you design such a "service", you first think about user authentication and proper information transport. The machine that runs my PowerDNS database is reachable via SSH, so let's use SSH for that. You do not only get user authentication, server authentication and properly crypted data transport, you also do not have to try hard to find out the IP-address you want to update the hostname to, just use $SSH_CLIENT from your environment. If you expected further explanation what has to be done next: sorry, we're done. We have the user (or hostname) by looking at the SSH credentials, and we have the IP-address to update it to if the data in the database is outdated. The only thing missing is some execution daemon or ... cron(8). :) The machine at home has the following cron entry now:

*/5 * * * * ssh -4 -T -i /home/evgeni/.ssh/powerdyn_rsa powerdyn@ssh.die-welt.net

This connects to the machine with the database via v4 (my IPv6 address does not change) and that's all. As an alternative, one can add the ssh call in /etc/network/if-up.d/, /etc/ppp/ip-up.d/ or /etc/ppp/ipv6-up.d (depending on your setup) to be executed every time the connection goes up. The machine with the database has the following authorized_keys entry for the powerdyn user:

no-agent-forwarding,no-port-forwarding,no-pty,no-X11-forwarding,no-user-rc,\ 
command="/home/powerdyn/powerdyn/powerdyn dorei.kerker.die-welt.net" ssh-rsa AAAA... evgeni@dorei

By forcing the command, the user has no way to get the database-credentials the script uses to write to the database and neither cannot update a different host. That seems secure enough for me. It won't scale for a setup as DynDNS.com and the user-management sucks (you even have to create the entries in the database first, the script can only update them), but it works fine for me and I bet it would for others too :) Update: included suggestions by XX and Helmut from the comments.

Wheezy, ejabberd, Pidgin and SRV records

TL;DR: {fqdn, "jabber.die-welt.net"}. So, how many servers do you have, that are still running Squeeze? I count one, mostly because I did not figure out a proper upgrade path from OpenVZ to something else yet, but this is a different story. This post is about the upgrade of my "communication" machine, dengon.die-welt.net. It runs my private XMPP and IRC servers. I upgraded it to Wheezy, checked that my irssi and my BitlBee still could connect and left for work. There I noticed, that Pidgin could only connect to one of the two XMPP accounts I have on that server. sargentd@jabber.die-welt.net worked just fine, while evgeni@golov.de failed to connect. ejabberd was logging a failed authentication: I(<0.1604.0>:ejabberd_c2s:802) : ({socket_state,tls,{tlssock,#Port<0.5130>,#Port<0.5132>},<0.1603.0>}) Failed authentication for evgeni@golov.de While Pidgin was just throwing "Not authorized" errors. I checked the password in Pidgin (even if it did not change). I tried different (new) accounts: anything@jabber.die-welt.net worked, nothing@golov.de did not and somethingdifferent@jabber.<censored>.de worked too. So where was the difference between the three vhosts? jabber.die-welt.net and jabber.<censored>.de point directly (A/CNAME) to dengon.die-welt.net. golov.de has SRV records for XMPP pointing to jabber.die-welt.net. Let's ask Google about "ejabberd pidgin srv". There indeed are some bugs. But they are marked as fixed in Wheezy. Mhh... Let's read again... Okay, I have to set {fqdn, "<my_srv_record_name>"}. when this does not match my hostname. Edit /etc/ejabberd/ejabberd.cfg, add {fqdn, "jabber.die-welt.net"}. (do not forget the dot at the end) and restart the ejabberd. Pidgin can connect again. Yeah.

Opera, standards and why I should have stayed in my cave

So you probably heard that I have that little new project of mine: QiFi the pure JavaScript WiFi QR Code Generator. It's been running pretty well and people even seem to like it. One of its (unannounced) features is a pretty clean stylesheet that is used for printing. When you print the result will be just the SSID and the QR code, so you can put that piece of paper everywhere you like. That works (I tested!) fine on Iceweasel/Firefox 10.0.12 and Chromium 25.0. Today I tried to do the same in Opera 12.14 and it failed terribly: the SSID was there, the QR code not. And here my journey begins... First I suspected the CSS I used was fishy, so I kicked all the CSS involved and retried: still no QR code in the print-out. So maybe it's the QR code library I use that produces a weird canvas? Nope, the examples on http://diveintohtml5.info/canvas.html and http://devfiles.myopera.com/articles/649/example5.html don't print either. Uhm, let's Google for “opera canvas print”... And oh boy I should not have done that. It seems it's a bug in Opera. And the proposed solution is to use canvas.toDataURL() to render the canvas as an image and load the image instead of the canvas. I almost went that way. But I felt that urge need to read the docs before. So I opened http://www.w3.org/html/wg/drafts/html/master/embedded-content-0.html#dom-canvas-todataurl and https://developer.mozilla.org/en-US/docs/DOM/HTMLCanvasElement and started puking:

When trying to use types other than "image/png", authors can check if the image was really returned in the requested format by checking to see if the returned string starts with one of the exact strings "data:image/png," or "data:image/png;". If it does, the image is PNG, and thus the requested type was not supported. (The one exception to this is if the canvas has either no height or no width, in which case the result might simply be "data:,".)

If the type requested is not image/png, and the returned value starts with data:image/png, then the requested type is not supported.

Really? I have to check the returned STRING to know if there was an error? Go home HTML5, you're drunk! Okay, okay. No canvas rendered to images then. Let's just render the QR code as a <table> instead of a <canvas> when the browser looks like Opera. There is nothing one could do wrong with tables, right? But let's test with the basic example first:

Yes, this is 2013. Yes, this is Opera 12.14. Yes, the rendering of a fucking HTML table is wrong. Needles to say, Iceweasel and Chromium render the example just fine. I bet even a recent Internet Explorer would... That said, there is no ~~bugfix~~workaround for Opera I want to implement. If you use Opera, I feel sorry for you. But that's all. Update: before someone cries "ZOMG! BUG PLZ!!!", I filled this as DSK-383716 at Opera.