Twas the night before 1.0

T’was the night before 1.0, and all through the cloud
Not an instance was flapping, reserved instances vowed
All the deployments were done via chef with great care
Idempotent deployments, so quick redeploys aren’t a bear
All the devs were home, nestled tight in their beds
while visions of stateless sessions danced in their heads
I with my maven and friends with their ant
Both waiting in earnest to give gradle a chance
When up on the office dashboard, came the red in a smatter
I sprang from my Aeron, and told pound-ops: “Hold the chatter!”
To the Puppetmaster, I SSH’d in a jiffy
sudo’ing my way to /var/log, searching for entries that were iffy
When, what to my wondering eyes should appear
But run-away processes, a load average of 80 on my gear
With some special tools on my old thumb stick
My debugging started lickedy split
With phone all a ringing the SMSes came
From PagerDuty dutifully calling my name
Now! ssh, now! tmux, now! grep and awk
On Python, on Perl. Ruby and gmock!
I desperately tailed log files, tailed them all
to find what had caused my servers to fall
When I looked at Graphite my consternation grew
My brand new database server’s gone down too!
And to my dismay load went through the roof
On every last node of my clustered Hadoop
Back through the changelog I looked to the sound
Of more and more alerts going off all around
Hardware failure? Software bug? Configuration caput?
Until it was fixed I would have to stay put
A bundle exec rake capistrano deploy command, in fact
Were the a few key presses I knew would rebuild my stack
Yet running the command failed causing much apprehension
ERROR: Failed to build gem native extension
Packet loss was rising now, 40 percent, oh no!
I was wondering how much worse it could go
Every cluster had problems, I gritted my teeth
Kernels were panicing, JVMs out of memory, No LOLcat relief
Surely our architecture wasn’t that smelly
For complex failures even beyond Machiavelli
My nerves shattered, I reached for the Scotch on the shelf
With panic almost overwhelming myself
I picked up the phone to wake the CTO from his bed
The product launch the next morning, must go ahead
He spoke not a word, as I told him “It’s borked”
But sighed down the line, as his mind went to work
“Ah ha, I have it! A foolproof plan”
“Have you tried turning it off and then on again?”
I did exactly what the great man suggested
And lo and behold, without needing to test it
Everything started working, Nagios said it’s alright
Happy launching tomorrow, and for me it’s good night.

 

Written by David Lutz, J. Paul Reed, Seth Thomas and Edward Ciramella for the Ship Show podcast.

 

Advertisements

Leave a comment

Filed under Uncategorized

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s