Home

Search Posts:

Archives

Login

January 2014

S M T W H F S
1 2 3 4
5 6 7 8 9 10 11
12 13 14 15 16 17 18
19 20 21 22 23 24 25
26 27 28 29 30 31

I'm going to stop putting content on this site, and I'm going to convert what exists to static HTML so the archives remain available. I intend to do this in the next month or so. If I have something in place to replace this site, I'll make a final post here linking to it.

The executive summary is that 1) the reasons for this are mostly technical and 2) I plan to bring something like this back somewhere else at some point. For boring details, read below:

---

I wrote this web site using Ruby on Rails some years ago. One of my goals when I did this was to have a place at which I could publish stuff that I wrote. Another goal, though, was to learn how to make a dynamic web site.

Well, I suppose you could say I was successful in both goals. I learned how to use Ruby on Rails. I wrote words.

Rails gives you a lot for "free," so making a thing like this isn't some impressive accomplishment, but it was an important thing for me to learn and understand at the time. I already knew ruby, and Rails seemed like a useful technology to be exposed to. Indeed the little hobby project was very helpful to me in understanding how dynamic web sites work.

Well, time moves on. I really favor python and django these days, and have little interest in Rails. So this site is running the same obsolete version of rails from ages ago, which has all sorts of scary sounding security vulnerabilities, but I cannot update it automatically because that would require me to rewrite my own code.

In addition to that, the virtual machine on which this site resides is running CentOS 5, and I'd really like to update that. The biggest reason I have not yet done so is that I know replicating old Rails environments can be troubling (as I learned upgrading from 4 to 5). So this is the "end of the line," basically. Something must give.

So here I have it: this site uses an old version of a web framework I no longer have interest in, which is no longer supported and is purportedly horribly insecure. My options then are to either dive back into Rails and update my code to work with modern versions, migrate to something completely different like Django or an off the shelf blogging platform, or just forget about it completely.

So, near term, I'm going to switch everything to static and add rewrites to make everything remain available while I figure out where I'll go next. Longer term, I'm not really sure. Perhaps I will switch to hosted wordpress which is robust and brain-dead - great if you just want to write content, not code. Or perhaps I will make something new. But I will certainly keep writing things somewhere.

Time for a little navel-gazing.

I write stuff on this blog because... well, I guess I think it's fun to write things. I don't really promote it anywhere. I just put information up here, and hope it finds its way to people who can use it. Until recently, I guess I never really thought about how that process might actually work in absence of any active promotion, and whether it even happened at all.

I try to post things that are either obscure or completely unique to my own personal experience. In the past I skewed more towards personal things; lately I've skewed heavily towards esoteric technical things which seemed to be difficult to track down through google (for whatever reason).

Anyway, I decided to find out whether people other than myself were getting any value from this project. It turns out a few people are, according to google's chart of some page ranks:

openwrt netflow		<10	<10	-	1.0

puppet append array <10 <10 - 1.0
puppet append to array <10 <10 - 1.0
puppet array append <10 <10 - 1.0
linux owa <10 <10 - 2.0
check_mk puppet <10 <10 - 2.0
puppet git clone 12 12 100% 2.2
puppet array <10 <10 - 2.6
openwrt ntop <10 <10 - 3.0
zfs volumes <10 <10 - 3.3
git puppet 16 <10 - 3.8
puppet staging environment <10 <10 - 4.0
owa chrome <10 <10 - 4.5
kickstart puppet <10 <10 - 5.0
puppet git 150 60 40% 5.6
centos 5 mysql 5.1 <10 <10 - 7.0
zfs raidz 22 12 55% 7.5
zfs volume <10 <10 - 9.0

Not many actual click throughs from google, but I was surprised by the last column which is "average page rank." What that means to me is that I have generally succeeded in filling knowledge gaps in esoteric subjects; maybe there aren't many people who want to know about OpenWRT and Netflow, but for the few people who do want to know that, my post on the subject is there waiting for them.

I find that list interesting in that I am certain I myself searched for all of those things before ultimately researching and posting on them myself. So while I was confronted with a void or a suboptimal result set when I executed those searches, I like to think that a person walking in my footsteps today would have a slightly more complete picture thanks to me.

Some specific interesting things to me:

- Variations on "git puppet" are definitely the most common reason people visit this blog. I'm not really in love with my little git workflow tutorial, but it's got a fairly high page ranking and a lot of clickthroughs. I think that's one of the most glaring voids I tried to fill, since I didn't really see any walkthroughs like it when I created it and it's something a lot of people are probably thinking about, but I want to revisit that with a higher quality version (or an addendum to it) at some point.

- The puppet array stuff is still highly ranked with the same page topping multiple search phrases and coming in high with several more, but with few people actually searching for it. I remember when I started wrestling with puppet, this seemed like a really big limitation. But over time I have come to use other patterns and only seldom find myself frustrated with this limitation.

- OWA Chrome and Linux is still useful to people. Not many, but I'm glad to save people the pain of "Lite" OWA.

- My various ZFS ramblings are highly rated for generic ZFS searches, but with apparently very few people actually doing generic ZFS searches. I'm a little sad that ZFS is mostly a niche player, because it's an impressive piece of technology.

On my MythTV box running Arch Linux, I boot from an SSD. This means that stuff happens *really* fast at startup - sometimes *too* fast.

The situation is that my eth0 device is not always initialized before the network.service service starts. This results in the service failing and no network:

Jan 02 20:44:29 mingus.lan systemd[1]: Starting Static IP...

Jan 02 20:44:29 mingus.lan ip[385]: Cannot find device "eth0"

A quick google revealed the solution on the Arch Linux forums. The long and short of it is add a dependency for your NIC on network.service. Get the device name like so:

[jnthornh@mingus system]$ systemctl --full | grep net-eth0

sys-devices-pci0000:00-0000:00:0a.0-net-eth0.device

Toss it in the "Unit" section of /etc/systemd/system/network.service:

Requires=sys-devices-pci0000:00-0000:00:0a.0-net-eth0.device

After=sys-devices-pci0000:00-0000:00:0a.0-net-eth0.device

And you're good to go.

More and more, the big phone companies want to control how you use the data you're paying for. One of their relatively new tricks is to sniff out "tethering" by inspecting the traffic coming from your device and then disabling your connection if you're using it "incorrectly."

Now, modern consumer broadband has gotten fast enough that you can reasonably establish a VPN to your home and proxy all traffic through that to bypass this. Most carriers do still allow SSL traffic (since so much of the web requires it these days), and this is (relatively) safe from such eavesdropping. Therefore it becomes feasible to set up OpenVPN (which uses SSL) and avoid the carrier's shenanigans.

In my case I have an OpenVPN server running on my openwrt router at home, but I also have a Linode virtual server. I stick to Linode when I just want to avoid this snooping but I will demonstrate both configurations.

You need to follow the OpenVPN docs to set up your CA and certs. Once you do so the configuration is fairly straightforward.

OpenVPN server (UCI for OpenWRT):

# /etc/config/openvpn
config 'openvpn' 'routing_server'
option 'cert' '/lib/uci/upload/cbid.openvpn.custom_config.cert'
option 'key' '/lib/uci/upload/cbid.openvpn.custom_config.key'
option 'ca' '/lib/uci/upload/cbid.openvpn.custom_config.ca'
option 'dh' '/lib/uci/upload/cbid.openvpn.custom_config.dh'
option 'port' '1194'
option 'proto' 'udp'
option 'dev' 'tun0'
option 'server' '192.168.168.0 255.255.255.0'
list 'push' 'route 192.168.3.0 255.255.255.0'
option 'keepalive' '10 120'
option 'comp_lzo' '1'
option 'enable' '1'
option 'client_to_client' '1'
option 'persist_key' '1'
option 'persist_tun' '1'
option 'status' '/tmp/openvpn-routing-status.log'
option 'verb' '3'

That's where the certs end up if you use LUCI. You can of course put them somewhere else. I'm pushing the route for 192.168.3.0/24 since that subnet is at home and I want to route to it.

On my linode:
# cat /etc/openvpn/server-routed.conf
cert /etc/openvpn/server.crt
key /etc/openvpn/server.key
ca /etc/openvpn/easy-rsa/keys/ca.crt
dh /etc/openvpn/easy-rsa/keys/dh1024.pem
dev tun1
comp-lzo
persist-tun
persist-key
server 10.8.0.0 255.255.255.0
keepalive 10 120
proto udp
verb 3
script-security 3

I'm not pushing any routes here since this is purely to act as a proxy.

IPtables:

iptables -I INPUT -m udp -p udp --dport 1194 -j ACCEPT
iptables -I INPUT -m tcp -p tcp --dport 1194 -j ACCEPT
iptables -A FORWARD -i tun1 -o eth0 -j ACCEPT
iptables -A FORWARD -i eth0 -o tun1 -m state --state RELATED,ESTABLISHED -j ACCEPT
iptables -t nat -A POSTROUTING -s 10.8.0.0/24 -o eth0 -j MASQUERADE

Now, on my client:

# cat openvpn.cfg
ca ca.crt
cert client.crt
key client.key
client
nobind
remote openvpn.myserver.com 1194
route-delay 10
dev tun
ifconfig-nowarn
ping 10
comp-lzo
verb 4
proto udp
resolv-retry infinite
persist-key
persist-tun
redirect-gateway

On an Android device, you can just write that openvpn.cfg and toss the certs in there, then you can read it from the OpenVPN app (from the Play Store) which will create the profile for you.

One thing I haven't worked out yet is how to start the VPN connection on my phone and then "tether" other devices behind it and have them all route through the VPN. For now I just connect to the VPN on the devices which are behind the phone (not the phone which is doing the routing).

"Advanced format" drives are a scourge. The fact that they use 4k sectors is good - but the fact that they lie and emulate 512b sectors is bad.

The issue? When an OS doesn't know the actual sector size, your disk tools end up using the 512b "virtual" sectors reported by the drive that can actually straddle multiple 4k physical sectors. This can cause substantial performance degradation, since to get the the 512b "virtual" sector you need, the drive may have to read two 4k sectors to create an emulated 512b sector. Yuck-o.

The emulation isn't quite as big a deal if your 512b sectors are aligned to evenly match the 4k sector boundaries. It's still suboptimal but it's not a disaster. This is what, e.g., Windows does - and this is why this strategy was apparently chosen by the storage vendors (old versions of Windows/NTFS can't understand 4k sector sizes, so the 512b sector emulation is there to allow these things to work at all).

But Windows' problems suddenly become everybody else's problems, since the storage vendors are now lying about disk geometry. If your filesystem tools are designed with the assumption that disks are telling the truth, they'll use the wrong sector size. And that's where things get sticky.

There are two ways to deal with this in ZFS. The first, and probably easiest, is to download a patched zpool which defaults to ashift = 12 and use that whenever you need to create a pool on such drives. Once the filesystem is created, you can go back to using the normal zpool command instead.

The "right" way to do this is to use the new facility to override sector size via /kernel/drv/sd.conf. The official Illumos wiki has information about this method now as well, but it's a little confusing to piece together how you actually make it all work. Specifically, finding the strings you need to put in sd.conf is really unclear; luckily a user added a comment on the issue tracker which provides a command that gives you just that.

So here's what I did with my Western Digital 3TB "Red" drives, AKA "WD30EFRX":

1) use the command above to get the "magic strings":

$ echo "::walk sd_state | ::grep '.!=0' | ::print struct sd_lun un_sd | ::print struct scsi_device sd_inq | ::print struct scsi_inquiry inq_vid inq_pid" | mdb -k
inq_vid = [ "ATA " ]
inq_pid = [ "WDC WD30EFRX-68A" ]

(If you have multiple disks, the output of "format" and "inq" should point you in the right direction)

2) configure sd.conf with the vendor string and a substring from the product string:

sd-config-list="WDC WD30EFRX","physical-block-size:4096",
"ATA WDC WD30EFRX","physical-block-size:4096",
"ATA-WDC WD30EFRX","physical-block-size:4096";

(Note - I think the "ATA WDC WD30EFRX" is the "right" entry there, but I added the others just to be safe)

3) do a full reboot (reboot -p). You can avoid this by doing the stuff mentioned on the wiki, supposedly, but that didn't seem to work for me. I think I needed to detach and re-attach the disk, but I didn't feel like messing with it and rebooting worked.

4) create the pool like normal

5) ensure that your ashift is 12

zdb -C vault | grep ashift

(use your pool name instead of vault)

I'll say this: the right way seemed a lot trickier in this case. I guess the notion is that in the future sd.conf will be maintained with a list of most such drives so users won't have to worry about it, but for now this isn't nearly as friendly as a simple command line option.

Oh well - gets the job done!

Well, Arch Linux finally pulled the plug on its old Init. About time, I guess - but I'm a procrastinator, and even though I was generally aware of systemd I had no compelling reason to switch until I was forced to.

So, now I'm forced to!

The Big Deal about this for Arch is that /etc/rc.conf is going to basically stop working for most things - you need to take any deprecated stuff out of there and put it where it now belongs.

Here's what you'll probably need to do to make the change in Arch when you lose your old init scripts:

1) Write your fqdn to /etc/hostname (nothing else - just your fqdn and a carriage return)

2) If you don't use UTC for your system clock (e.g. dual boot Windows), be sure /etc/adjtime reflects that fact (timedatectl does this)

3) Make sure any modules you need are loaded from /etc/modules-load.d/{whatever}.conf - one module per line, nothing else

4) Get rid of any legacy services. The easiest way for me to do this was to actually leave the DAEMONS line in rc.conf and reboot with systemd enabled. From there, you can figure out what's coming from rc.conf with: 'systemctl | grep -i legacy'. If you've got anything left, use 'systemctl enable {thing}' to enable the systemd version, and remove it from rc.conf (some specific things I remember - syslog, cron, ssh, iptables)

systemd is pretty sweet - like SMF from Solaris. My SSD-based system hits gdm like... nearly instantly with it enabled. But it'll take some getting used to for sure.

This is one of those pain points I'm glad Arch forces upon me, and systemd represents a real improvement in how init works. But I'd never have bothered with it voluntarily :)

Got an Android (or other) device that never seems to pull more than 500mA, and always thinks it's charging from a USB port, even when you've got a 3rd party charger that claims to push 1 amp or more?

Ever wonder why?

The answer is stupid, and it's basically: "Apple."

The USB spec (through 2.0) specifies a maximum per-device draw of 500mA. This is why, if you plug a device in to your USB port in your computer, it'll charge slowly (or, in some cases, not at all).

Of course, when USB first became popular as a charging standard for consumer electronics (around the time of the old school iPod), this created an issue: 500mA sucks. Lots of devices need (or, at least, could utilize) much more current than that, and so these devices often ship with their own AC adapter which can provide much higher current.

But how does the device know how much current it should be pulling? Simply attempting to draw more than 500mA from a PC USB port, which is spec'd at 500mA, would be a dick move. Up to now, PC manufacturers designed USB components with the assumption that nobody would be drawing more power, and drawing too much could do bad things (probably "just" blowing a fuse... but that's still an annoying no-no).

So, manufacturers needed a way to determine whether their devices were plugged in to a "data" port (at which point they'd limit current draw to the USB spec), and if not they'd open her up and pull a higher current. And they needed this to be cheap.

Problem: there was no standard for negotiating draw higher than 500 mA.

Let's back up: USB cables have four pins. Two of these are dedicated to power (+5 Volts/red and ground/black). Two of them are data pins that provide for data exchange (green/white).

Apple had an ingenious idea: what if we base our charging mode on *the voltage on the data pins*? If you detect the nominal PC voltage, you operate in "USB" mode. If you detect other voltages, you operate in... "other" modes.

Problem is, nobody else did this.

Now, as USB charging was becoming really popular, manufacturers figured they needed a standard way to address the issue. And, simple it was: if you short the data pins together, the device draws the full power supported. If not, it did the normal USB negotiation thing, and it limited its draw to 500mA.

But then came the iPhone. And you know what? Apple ignored the new standard, and still did their own thing (which continued with the iPad, natch).

And, today, the most popular phone in the world and the most popular tablet in the world have their own proprietary way to determine whether to limit their current draw to the USB spec.

What's a charger manufacturer to do?

Well, in most cases, the answer is: do what Apple wants. In others, it's ignore both of these things and produce something with *neither* Apple's voltage-testing solution *nor* the shorted pin solution. And, yes, in some cases there are chargers with the data pin shorted as well.

But, you know what? It's incredibly difficult to figure out which path a particular 3rd party charger manufacturer has taken, short of scouring product reviews.

Now, I'm here to tell you, dear readers with Android devices so afflicted, who have sat through this long, sad tale: there's a solution, should you have a device with not-shorted data pins! The solution is:

Short out the green/white data wires *in your USB cable itself* - and you can use it with any charger. Well, you shouldn't use it with a PC, for reasons mentioned above. Splice open the wire, connect white to green, tape it closed, and voila. Happy sailing!

Or, you could buy the "charge only" USB cables which are already cabled thusly. But good luck with that, since even Amazon offers inconsistent results (often, these are really "normal" cables simply *labeled* as charge-only cables).

If you're using puppet as part of your system deployment process, you may notice some issues related to the fact that you're in a half-baked chroot rather than a full-fledged system when puppet runs.

Concrete example: anaconda/kickstart and RHEL. You won't be able to start the SSH daemon when puppet runs in the KS chroot if you're already running SSH in anaconda outside of the chroot. This means that using ensure => running on your sshd resource will cause a failure.

My approach to kickstart is "do as little with kickstart as possible and let puppet do as much as possible." The only stuff I want in kickstart is stuff that *has* to be in kickstart: partitioning, setting up the network, installing and running puppet, etc. There's no reason to duplicate stuff between KS and puppet, and if you offload too much logic to KS your puppet config's notion of how to configure resources will be incomplete and dependent on your KS process.

Problem is, those failed resources can really add up, and it means you might not get a full puppet run during deployment. Luckily, there's a pretty easy workaround: a custom fact that exists only in kickstart. For any resources that will fail in kickstart, you can wrap them in a check for that fact.

How it works in practice! Add this to your kickstart script, after puppet is installed but before it runs (obviously, if your facts are in a different path, put it where they go):

cat >> /usr/lib/ruby/site_ruby/1.8/facter/in_kickstart.rb << EOF

Facter.add("in_kickstart") do
setcode do
1
end
end
EOF

This is about the simplest fact that can exist; all it'll do is set the fact $::in_kickstart to "1".

After puppet runs in your kickstart, simply get rid of that file so that the subsequent run won't have the fact any more:

rm /usr/lib/ruby/site_ruby/1.8/facter/in_kickstart.rb

Now, in puppet's DSL, you'd do something like the following to utilize it:

    # do not actually start the ssh daemon when we are inside kickstart

if $::in_kickstart {
service {"sshd":
enable => true,
}
} else {
service {"sshd":
subscribe => File["/etc/ssh/sshd_config"],
ensure => running,
enable => true,
}
}

In this case you just need to get rid of the "ensure" option; in others it might be more complex.

I use Puppet Dashboard as my external node classifier.

Once you have a few hundred nodes, it can be very beneficial to use an ENC instead of maintaining node definitions directly in your puppet configuration. For one thing, it helps to visualize the status of nodes, and for another it is often useful to logically group nodes and visualize those groups. Most (although not all) of what an ENC can do is possible with flat files and clever includes or realize statements, but having a web UI like Dashboard is a huge win in legibility when managing groups of nodes.

I would describe Dashboard as being "pretty good." It's not the only ENC out there - and some of the others are highly recommended - but I have gravitated towards it since it is maintained by Puppet Labs rather than a third party. It's certainly got some rough edges, and some missing functionality, but it gets the job done and it's OSS.

My biggest issues with Dashboard have generally been related to performance. Over time, the thing starts running like a dog on the VMs on which I typically run my puppet master.

You might notice this yourself: Dashboard runs fine at first, but over time it becomes less and less responsive. Puppet runs themselves seem OK, but actual page loads in the UI are terrible.

The biggest reason for this is the (nifty) reporting tables, which store data about puppet runs. Unfortunately, barring your intervention, these tables will grow indefinitely; and data about e.g. year old puppet runs is arguably not very valuable. If you read the actual docs, you'll note that Dashboard includes a rake task which you should run via cron job to prune reports and keep the sizes of these tables sane:

rake RAILS_ENV=production reports:prune upto=7 unit=day

Obviously, you can keep more or less data depending on your hardware. Our hardware is virtual and scarce, and keeping things lean is crucial. We run at a rate of roughly 80,000 reports per week, and running this task nightly keeps us from ever greatly exceeding that.

Of course, there are some gotchas.

Until 2011, the Rake task did not prune resource_statuses, and you had to manually deal with that (thankfully, this functionality is now included in the rake task). If you're running an older puppet dashboard, and things are slowing to a crawl... time to upgrade, man! But barring that you should at least prune the resource_statuses table with your own SQL.

If you're running on meager hardware, you should really be using Ruby Enterprise to make Rails performance suck a little bit less. It's even possible to use the RPM version of dashboard (and of puppet itself) along with REE with a little tinkering.

Even doing this stuff, though, we had some trouble. In the web UI, loading node pages started taking longer and longer. And I saw this in our slow.log:

# Query_time: 7.419537 Lock_time: 0.000077 Rows_sent: 1 Rows_examined: 4766330
SELECT count(*) AS count_all FROM `timeline_events` WHERE ((subject_id = 398 AND subject_type = 'Node') OR (secondary_subject_id = 398 AND secondary_subject_type = 'Node'));

Well now, that's an expensive little query. And it's run on every node page load.

Honestly, I didn't even know what timeline_events were, but that's a hell of a lot of rows to examine. One thing you could do would be to add indexes:

ALTER TABLE timeline_events ADD INDEX indexsubjectid ( subject_id , subject_type ) ;
ALTER TABLE timeline_events ADD INDEX indexsecondary ( secondary_subject_id , secondary_subject_type );

Which helps, but... do we really need all these entries in there anyway?

It turns out this table is used to track when node objects in Dashboard change. This can happen when you actually edit a node in the web UI, which is good to know (Hey, why did this node suddenly crap itself? Oh, looks like somebody jacked with it in the ENC and should be reprimanded! Thanks timeline_events!). It also happens whenever a node creates a report... which is redundant, since reports have their own table, which (in my case) I'm keeping very lean.

So timeline_events gets a new entry every time puppet runs, on every puppet node. And this table doesn't actually have any data about what happened - it just says "hey, something was updated" - which is, um, not very interesting, when we can get the complete skinny from the report table. I'll leave it to you to decide how much value this data has to your organization, but I personally decided "not much" and did this:

mysql> delete from timeline_events where created_at <= DATE_SUB(now(), INTERVAL 1 MONTH );

Boom. Fast again! Looks like we need another cron job...

I seem to be pushing up against the limitations of puppet lately. In this particular case, I wanted newly added nodes to automatically receive corresponding entries in /etc/exports on my file server.

Sounds pretty simple, right? Well, it's not.

Puppet doesn't give you much of an opportunity to collect data from other nodes directly. There's no way, for example, to express "give me a list of all nodes" in the DSL; your scope is intentionally restricted to the node on which your running. This sort of metadata that the puppet server knows about other nodes is simply not provided.

In some cases, there's an easy workaround: collect / export. I have used this pattern successfully in the past, and I even posted a blog entry on how it helped me with check_mk. There is one rather large, unfortunate limitation of this pattern though: it only works for the case where your entry can be defined by each of the nodes doing the exporting of the resource.

That may not be entirely clear at first, but consider the case of /etc/exports. This is a file which has no native type to manage it, so the common approach to getting it done would be to write a template. Now, although I can create per-node exported resources, and collect them on the file server, what would they look like? I need all of those entries to end up in /etc/exports - and there's no way I can do that with per-node templates, since the file each node exports must be unique (there is not, for example, exports.d, as there was a conf.d for check_mk).

The closest thing you can currently do (short of writing your own exports provider - a frustrating problem in and of itself given the state of fileparser) is to use collect/export with the Augeas type. But this is limiting, too; you cannot automatically purge stale or removed hosts from the file when using Augeas. There are other, even more hacky solutions that smush multiple files together on the client, but that's fragile and annoying so I don't have any interest in it.

What I want - which seems simple - is a list of nodes in a variable. That's it.

And that got me thinking - what about... facts?

Converting collect/export data into facts

So, what if we create a collect/export generated conf.d style directory, populated with per-host files... and then have a custom fact that collates them into something on the file server?

Why, you know what? Then we would have a string that we could parse to get the data we need.

Here's the trick. First, on our nodes that we're exporting:

    @@file { "/var/puppet/nfs_hosts/$::fqdn":

content => "$::fqdn
",
tag => "nfs_host",
}

This is the most simple scenario; all I care about is the FQDN. You can do other tricks here, make that content whatever you want. You could even need to do something like key/value pairs in there and split them out in the final ERB, but that's not what I needed in this case.

Now, the collection on the file server:

    File <<| tag == 'nfs_host' |>> {

}

Easy enough - so what do we have now? We have a directory full of files named after nodes, each of which contains the node name, on our file server.

How do we make it a fact? We write a fact and push it out to the file server. Hint: writing facts is easy:

    Facter.add("nfs_hosts") do

setcode do
path="/var/puppet/nfs_hosts"
if File.exists?(path) && File.directory?(path) && ! Dir[path + '/*'].empty?
output = Facter::Util::Resolution.exec('/bin/cat /var/puppet/nfs_hosts/*').split('\n').join(' ')
else
output = nil
end
output
end
end

Magic! Now we should get our fact $::nfs_hosts on the file server, which is a space delimited list of all the nodes that exported the resource.

So, this is obviously a hack, but in my estimation it's the least hacky hack of what we've got out there given how few LOC are involved and the fact that it's mostly contained within the DSL. There's one particular limit of this hack that you need to know:

It will take the file server an extra puppet run before the fact is updated.

Due to the... fact... that facts are generated prior to the collection of the exported resources, the $::nfs_hosts fact that the file server reports will not reflect changes made during that puppet run.

The workaround? Run puppet twice as frequently on this node, or be content to wait an extra run cycle, or run puppet manually twice to make sure the change happens more quickly.

Hey, I don't like this any more than you do. But at least you know that it's an option. And knowing is half the battle.