The Angry Dome

S	M	T	W	H	F	S
			1	2	3	4
5	6	7	8	9	10	11
12	13	14	15	16	17	18
19	20	21	22	23	24	25
26	27	28	29	30

Adventures in ZFS: arc, l2arc, dedupe, and streaming workloads

by: Jeremy
file under: geekery
at: Jun 28 2011 11:25
0 Comments (post new)

The ARC

ZFS uses what's known as "adaptive replacement cache" (almost always just called "the arc") to hold both metadata and filesystem data in fast storage, which can dramatically speed up read operations for cached objects. When you start using zfs, this all happens behind the scenes in memory; Solaris or OpenIndiana dedicates a chunk of RAM for the ARC, and it reduces the size of the ARC when memory pressure demands it. Ben Rockwood wrote a good introduction to the ARC and a tool you can use to examine its state, so if you're interested in more details be sure to check that out.

Now, in addition to the ARC that sits in RAM, ZFS also has a facility to use level two adaptive replacement cache ("l2arc") on other "fast" storage. You can, for example, attach a fast SSD to a pool as l2arc, and ZFS will start using it as secondary cache. It won't be as fast as RAM, of course, but it's still potentially much faster than spinning rust, especially for random I/O.

In most cases you can just ignore the ARC, and happily reap the benefits of faster reads from cache. Adding additional RAM or assigning additional l2arc drives enables the ARC to cache more; a nice bonus for sure, but it's not the end of the world if you run out of cache.

There is, however, a critical exception to this: ARC becomes absolutely vital when dedupe is enabled. Before you even think about turning dedupe on, you'd better start thinking about the size of your ARC.

Dedupe and the ARC

When you turn on dedupe, you add a massive chunk of metadata known as the dedupe table ("DDT") into the equation. The dedupe table is where the magic happens; ZFS uses the table to identify duplicated blocks. Any writes with dedupe enabled will require lookups to this table first.

The reason you need to be thinking about ARC when the DDT is in play is this: the DDT is stored in the ARC. If your ARC can't fit the entire DDT, then every single time you try to write or read data, zfs will have to retrieve the DDT from spinning rust. The nature of the table makes this even more of a disaster, since it's a whole lot of small, random I/O - which is something normal hard drives are very bad at.

So heed this warning: if you turn on dedupe without enough RAM to cache the DDT in ARC, your write speeds can decrease by an order of magnitude (or more).

So, how much memory do you need in order to effectively use dedupe? The common answer on mailing lists or IRC is "as much as you can afford," and in practice that's probably the best advice you'll get. There are calculations you can make based on data retrieved from undocumented commands, but as a starting point you should count on at least 1 GB of ARC per 1 TB of data.

One frustrating aspect of this scenario is that it's very difficult to see what your DDT is really doing. ARC data in general is not exposed to the user by tools that come with OI or Solaris; indeed, one must use third party tools such as arc_summary, arcstat, or sysstat to see what's going on at all.

One thing you can do to potentially save yourself a world of pain is to ensure you have SSDs for l2arc. We really want the DDT in RAM, but having it on SSD will prevent the system from being completely useless if memory is exhausted, so it's a great idea to have SSD l2arc devices assigned to any pool that you want to dedupe. Unlike the in-memory ARC, we have some visibility into the l2arc directly provided by the zpool iostat utility:

zpool iostat -v [pool]

Caching strategy

When I added l2arc to my system and turned on dedupe, I paid very close attention to my cache usage. There are two tunables at the zfs dataset level which determine what ends up in the ARC: 'primarycache' and 'secondarycache'.

The values possible for these options are 'all', 'off', and 'metadata.' You can use this to selectively decide whether you want caching on your different layers of ARC; 'primarycache' is RAM, and 'secondarycache' is l2arc. The ddt is "metadata," so the most paranoid approach is to set both of these to "metadata" which will ensure that the DDT always has room to exist.

The problem with this conservative approach is that you lose all the benefits of caching filesystem operations. I attempted to cache only 'metadata' in the primarycache and then cache 'all' in secondarycache, but that doesn't do what you might expect; it turns out that as currently implemented you cannot cache something in secondarycache that is not cached in primarycache first. That means that if you want any filesystem caching anywhwere, you must use 'primarycache = all'. You can then reserve the l2arc for metadata cache if you desire. I settled on all/all, since I noticed that my l2arc was barely used at all when reserved for metadata.

Tuning the ARC for streaming workloads

Even with both caches set to "all," I noticed that my l2arc wasn't filling very quickly. The reason for this is that, by default, the l2arc will only cache random I/O; a sane strategy, since it speeds up the most costly operations. But in my case, with lots of streaming workloads and lots of l2arc, I was missing out on some potential performance gains.

You can set a tuneable that changes this behavior in /etc/system: set zfs:l2arc_noprefetch = 0

After a reboot, streaming workloads will be cached.

Priming the cache

With the streaming caching enabled, virtually every file you read will now be cached for future use. The effect is that, for reads of cached data, your performance will be close to that of running directly off of SSD.

I use this primarily for video games to improve load times. I export my zfs storage via NFS, and on my Linux workstation I install games onto that NFS filesystem and run them in wine.

A couple of things about the l2arc: it doesn't retain the cache through reboots, and less frequently accessed data will "fall off" the cache if it becomes full. This means that, if you want to consistently have some subset of your data cached in the l2arc, you need to read it all in regularly. I do this with a daily cron job:

36 6 * * * sh -c 'tar -cvf - /vault/games/RIFT > /dev/null 2> /dev/null'

Doing so dramatically improves read performance for the cached directories, and it ensures that they will always be in the cache (even when I haven't played RIFT for several days).

Using git with puppet

by: Jeremy
file under: geekery
at: Jun 21 2011 15:38
8 Comments (post new)

Git is a powerful tool, and one that I feel Ops folks could use more extensively. Unfortunately, although git has good documentation and excellent tutorials, it mostly assumes that you're working on a software project; the needs of operations are subtly (but vitally) different.

I suppose this is what "devops" is about, but I'm reluctant to use that buzzword. What I can tell you is this: if you take the time to integrate git into your processes, you will be rewarded for your patience.

There is tons of information about how git works and how to accomplish specific tasks in git. This isn't about that; this is a real-world example with every single git command included. I only barely hint at what git is capable of, but I hope that I give you enough information to get started with using git in your puppet environment. If you have any specific questions, don't worry, because help is out there.

A trivial workflow with git for accountability and rollbacks

I've long used git for puppet (and, indeed, any other text files) in a very basic way. My workflow has been, essentially:

1) navigate to a directory that contains text files that might change

2) turn it into a git repo with:

git init; git add *; git commit -a -m "initial import"

3) whenever I make changes, I do:

git add *; git commit -a -m "insert change reason here"

This simple procedure manages to solve several problems: you have accountability for changes, you have the ability to roll back changes, you have the ability to review differences between versions.

One very attractive feature of git (as compared to svn or cvs) is that one can create a git repo anywhere, and one needs no backing remote repository or infrastructure to do so; the overhead of running git is so minimal that there's no reason not to use it.

This workflow, though, does not scale. It's fine and dandy when you have one guy working on puppet at a time, but multiple people can easily step on each other's changes. Furthermore, although you have the ability to roll back, you're left with a very dangerous way to test changes: that is, you have to make them on the live puppet instance. You can roll your config back, but by the time you do so it might already be too late.

A simple workflow with git branches and puppet environments

It's time to move beyond the "yes, I use version control" stage into the "yes, I use version control, and I actually test changes before pushing them to production" stage.

Enter the puppet "environment" facility - and a git workflow that utilizes it. Puppet environments allow you to specify an alternate configuration location for a subset of your nodes, which provides an ideal way for us to verify our changes; instead of just tossing stuff into /etc/puppet and praying, we can create an independent directory structure for testing and couple that with a dedicated git branch. Once satisfied with the behavior in our test environment, we can then apply those changes to the production environment.

The general workflow

This workflow utilizes an authoritative git repository for the puppet config, with clones used for staging, production, and ad-hoc development. This git repository will contain multiple branches; of particular import will be a "production" branch (that will contain your honest-to-goodness production puppet configuration) and a "staging" branch (which will contain a branch designed to verify changes). Puppet will be configured to use two or more locations on the filesystem (say, /etc/puppet and /etc/puppet-staging) which will be clones from the central repository and will correspond to branches therein. All changes to puppet should be verified by testing a subset of nodes against the configuration in /etc/puppet-staging (on the "staging" branch), and once satisfied with the results they are merged into the "production" branch, and ultimately pulled into /etc/puppet.

Here's what it looks like:

/opt/puppet-git: authoritative git repository. I will refer to it by this location but in your deployment it could be anywhere (remote https, ssh, whatever). Contains at a minimum a "production" branch and a "staging" branch, but optionally may contain many additional feature branches. Filesystem permissions must be read/write by anybody who will contribute to your puppet configuration

/etc/puppet-staging: git repository that is cloned from /opt/puppet-git that always has the "staging" branch checked out. Filesystem permissions must be read/write by anybody who will push changes to staging (consider limiting this to team leads or senior SAs)

/etc/puppet: git repository that is cloned from /opt/puppet-git that always has the "production" branch checked out. Filesystem permissions must be read/write by anybody who will push changes from staging to production (again, consider limiting this to team leads or senior SAs)

The key element here is the authoritative repository (/opt/puppet-git). Changes to the production repository (in /etc/puppet) should never be made directly; rather, you will 'git pull' the production branch from the authoritative repository. The staging repository (and the "staging" branch) is where changes must occur first; when QA is satisfied, the changes from the staging branch will be merged into the production branch, the production branch will be pushed to the authoritative repository, and the production repository will pull those changes into it.

Why do I have three git repositories?

You might be saying to yourself: self, why do I have all of these repositories? Can't I just use the repository in /etc/puppet or /etc/puppet-staging as my authoritative repository? Why have the intermediary step?

There are a couple of reasons for this:

One, you can use filesystem permissions to prevent accidental (or intentional) modification directly to the /etc/puppet or /etc/puppet-staging directories. For example, the /etc/puppet repository may be writeable only by root, but the /etc/puppet-staging repository may be writeable by anybody in the puppet group. With this configuration anybody in the puppet group can mess with staging, but only somebody with root can promote those changes to production.

Two, some git operations (e.g. merging and rebasing) require that you do some branch switcheroo voodoo, and (at least in production) we can't have our repository in an inconsistent state while we're doing so. Furthermore, git documentation recommends in general that you never 'push' changes to branches over an active checkout of the same branch; by using a central repository, we don't have to deal with this issue.

Of course, one advantage of git is its sheer flexibility. You might decide that the staging repository would make a good authoritative source for your configuration, and that's totally fine. I only present my workflow as an option that you can use; it's up to you to determine which workflow fits best in your environment.

Initial prep work

Step 0: determine your filesystem layout and configure puppet for multiple environments

RPM versions of puppet default to using /etc/puppet/puppet.conf as their configuration file. If you've already been using puppet, you likely use /etc/puppet/manifests/site.pp and /etc/puppet/modules/ as the locations of your configuration. You may continue to use this as the default location if you wish.

In addition to the "production" configuration, we must specify additional puppet environments. Modify puppet.conf to include sections by the names of each "environment" you wish to use. For example, my puppet.conf is as follows:

[main]

... snip ...

    manifest = /etc/puppet/manifests/site.pp

    modulepath = /etc/puppet/modules

[staging]

    manifest = /etc/puppet-staging/manifests/site.pp

    modulepath = /etc/puppet-staging/modules

You may configure multiple, arbitrary environments in this manner. For example, you may have per-user environments in home directories:

[jeremy]

    manifest = /home/jeremy/puppet/manifests/site.pp

    modulepath = /home/jeremy/puppet/modules

It is also possible to use the $environment variable itself to allow for arbitrary puppet environments. If you have a great many puppet administrators, that may be preferable to specifying a repository for each administrator individually.

Step 1: create your authoritative repository

If you're coming from the trivial usage of git that I outlined at the start of this post, you already have a repository in /etc/puppet.

If you're already using git for your puppet directory, just do the following:

cp -rp /etc/puppet /opt/puppet-git

If you aren't already using git, that's no problem; do the cp just the same, and then:

cd /opt/puppet-git; git init .; git add *; git commit -a -m "initial import"

Step 2: set up branches in your authoritative repository

Now we have our new central repository, but we need to add the branches we need:

cd /opt/puppet-git; git branch production; git branch staging

Do note that I didn't "check out" either of those branches here; I'm just leaving puppet-git on "master" (which in truth we'll never use). NB: you might consider making this a "bare" repository, as it's never meant to be modified directly

Step 3: set up your "staging" git repository

As configured in step 0, we have an environment where we can test changes in puppet, but right now there's no configuration there (in such a case, nodes in the "staging" environment will use the default puppet configuration). We need to populate this environment with our existing configuration; let's create a clone of our git repo:

git clone /opt/puppet-git /etc/puppet-staging

We now have our copy of the repository, including both of its branches. Let's switch to the "staging" branch:

cd /etc/puppet-staging; git checkout staging

Step 4: set up your "production" git repository

This is essentially the same as step 3, with one twist - we already have something in /etc/puppet. While it's possible to turn /etc/puppet into a git repository with the proper remote relationship to our authoritative repository, I find it's easiest to just mv it out of the way and do a new checkout. Be sure to stop your puppet master while you do this!

service puppetmaster stop; mv /etc/puppet /etc/puppet.orig; git clone /opt/puppet-git /etc/puppet;

cd /etc/puppet; git checkout production; service puppetmaster start

Workflow walkthrough

In this configuration, it is assumed that all changes must pass through the "staging" branch and be tested in the "staging" puppet environment. People must never directly edit files in /etc/puppet, or they will cause merge headaches. They should also never do any complex git operations from within /etc/puppet; instead, these things must be done either through per-user clones or through the staging clone, and then pushed up to the authoritative repository once complete.

This may sound confusing, but hopefully the step-by-step will make it clear.

Step 0: set up your own (user) git repository and branch

While optional for trivial changes, this step is highly recommended for complex changes, and almost required if you have multiple puppet administrators working at the same time. This gives you a clone of the puppet repository on which you are free to work without impacting anybody else.

First, create your own clone of the /opt/puppet-git repository:

git clone /opt/puppet-git /home/jeremy/puppet

Next, create and switch to your own branch:

cd ~/puppet/; git checkout -b jeremy

In the sample puppet.conf lines above, I've already enabled an environment that refers to this directory, so we can start testing nodes against our changes by setting their environments to "jeremy"

Step 1: update your local repository

This is not needed after a fresh clone, but it's a good idea to frequently track changes on your local puppet configuration to ensure a clean merge later on. To apply changes from the staging branch to your own branch, simply do the following:

cd ~/puppet/; git checkout staging; git pull; git checkout jeremy; git rebase staging

This will ensure that all changes made to the "staging" branch in the authoritative repository are reflected in your local repository.
NB: I like to use "rebase" on local-only branches, but you may prefer "merge." In any case that I mention "rebase", "merge" should work too
EDIT: I originally had a 'git pull --all' in this example; use 'git fetch --all' instead

Step 2: make changes in your working copy and test

Now, you can make changes locally to your heart's content, being sure to use 'git commit' and 'git add' frequently (since this is a local branch, none of these changes will impact anybody else). Once you have made changes and you're ready to test, you can selectively point some of your development systems at your own puppet configuration; the easiest way to do so is to simply set the "environment" variable to (e.g.) "jeremy" in the node classifier. If you aren't using a node classifier, first, shame on you! Second, you can also make this change in puppet.conf

step 3: ensure (again) that you're up to date, and commit your changes

If you're happy with the changes you've made, commit them to your own local branch:

cd ~/puppet; git add *; git commit -a -m "my local branch is ready"

Then, just like before, we want to make sure that we're current before we try to merge our changes to staging:

cd ~/puppet; git fetch --all; git checkout staging; git pull; git checkout jeremy; git rebase staging

BIG FRIGGIN WARNING: This is where stuff might go wrong if other people were changing the same things you changed, and committed them to staging before you. You need to be very certain that this all succeeds at this point.

step 4: merge your branch into "staging"

Now that you have a valid puppet configuration in your local repository, you must apply this configuration to staging. In an environment with well defined processes, this step may require authorization by a project manager or team lead who will be the "gatekeeper" to the staging environment. At the very least, let your team members know that staging is frozen while you test your changes. Changes to staging (and by extension production) must be done serially. The last thing you want is multiple people making multiple changes at the same time.

When you're ready, first merge your branch to the staging branch in your local repository:

cd ~/puppet; git checkout staging; git merge jeremy

Assuming that the merge is clean, push it back up to the central repository:

git push

Step 5: update puppet's staging configuration

Now the central repository has been updated with our latest changes and we're ready to test on all "staging" nodes. On the puppet server, go to your staging directory and update from the authoritative repo:

cd /etc/puppet-staging; git pull

Step 6: test changes on staging puppet clients

If you live in the world of cheap virtual machines, free clones, and a fully staffed IT department, you'll have a beautiful staging environment where your puppet configuration can be fully validated by your QA team. In the real world, you'll probably add one or two nodes to the "staging" environment and, if they don't explode within an hour or two, it's time to saddle up.

If you have to make a minor change at this point, you may directly edit the file in /etc/puppet-staging, commit them with 'git commit -a', and then perform a 'git push' to put them in the authoritative repo; if you have a rigid change control procedure in place, you may need to roll back staging and go all the way back to step 1.

As a general rule: try to keep staging as close to production as possible and if possible only test one change in staging at a time. Don't let multiple changes pile up in staging; push to staging only when you're really ready. If a lot of people are waiting to make changes, they should confine them to their own branches until such a time as staging has "caught back up" to production.

Step 7: apply changes to production

Once staging has been verified, you need to merge into production. Again, this step may require authorization from a project manager or team lead who will sign off on the final changes.

Although it is possible to do this directly from the git repo in /etc/puppet-staging, I recommend that you use your own clone so as to leave staging in a consistent state throughout the process.

Start by again updating from the authoritative repo:

cd ~/puppet; git fetch --all; git checkout staging; git pull; git checkout production; git pull

At this point you should manually verify that there aren't any surprises lurking in staging that you don't expect to see:

git diff staging

If everything looks good, apply your changes and send them up to the repo:

git merge staging && git push

Step 8: final pull to production

If you're dead certain that nobody will ever monkey around directly in /etc/puppet, you can just pull down the changes and you're done:

cd /etc/puppet; git pull

That's fine and dandy if everybody follows process, but it may cause trouble if anybody has mucked around directly in /etc/puppet. To be sure that nothing unexpected is going on, you may want to use a different procedure to verify the diff first:

cd /etc/puppet; git fetch; git diff origin/production

Once satisfied, apply the changes to the live configuration:

git merge origin/production

That's great, but what's the catch?

Oh, there's always a catch.

I think the process outlined here is sound, but when you add the human element there will be problems. Somebody will eventually edit something in /etc/puppet and un-sync the repos. Somebody will check something crazy into staging when you're getting ready to push your change to production. Somebody will rebase when they should've merged and merge conflicts will blow up in your face. A million other tiny problems lurk around the corner.

The number one thing to remember with git: stay calm, because you can always go back in time. Even if something goes to hell, every step of the way has a checkpoint.

If you follow the procedure (and I don't mean "this" procedure; I just mean "a" procedure) your chance of pain is greatly reduced. Git is version control without training wheels; it will let you shoot yourself in the foot. It's up to you to follow the procedure, because git will not save you from yourself.