Skip to content
Pipelines and Pizza šŸ•
Go back

GitHub Actions and Job Scheduling, Part 1: The Problem

4 min read

This is Part 1 of a three-part series on migrating scheduled tasks and ad-hoc jobs to GitHub Actions with Docker-based self-hosted runners.

The Server Under Someone’s Desk

Every organization has one. Maybe it’s not literally under a desk anymore, but somewhere in your environment there’s a Linux server running cron jobs that nobody fully understands. The person who set them up left two years ago. The documentation is a sticky note that’s long gone. And the only way you find out a job failed is when someone downstream notices the data didn’t show up.

I recently started working with a customer in exactly this situation. They had scheduled tasks and ad-hoc manual jobs scattered across multiple older servers. No central place to see what was running, no version control on the scripts, and no real history of what succeeded or failed. The strategy was simple: set it and pray it works.

Sound familiar?

What We Found

When we started inventorying what was actually running, we uncovered the usual suspects:

  • Inactive user monitoring and cleanup — scripts that scanned Active Directory for stale accounts and disabled them on a schedule.
  • Automated reporting — nightly jobs that pulled data, crunched numbers, and dropped CSVs into shared drives.
  • Server reboots — scheduled restarts for servers that ā€œjust needed itā€ on a regular basis.
  • Packer image builds — building golden server images, kicked off manually whenever someone remembered.
  • Linux patching with Ansible — playbooks that ran from a specific server against inventory files that may or may not have been current.

None of this is unusual. These are the bread-and-butter operational tasks that keep an environment running. The problem wasn’t what they were doing — it was how they were doing it.

The Pain Points

Once we mapped everything out, the issues became clear.

No visibility. There was no dashboard, no log aggregation, no single place to answer ā€œwhat ran last night and did it work?ā€ If a cron job failed silently at 2 AM, nobody knew until the consequences showed up — sometimes days later.

No version control. Scripts lived on servers, not in repositories. Changes were made live, in place, with no history. If someone broke a script, there was no easy way to roll back. Worse, there was no way to review changes before they went into production.

The ā€œit only runs on Dave’s serverā€ problem. Specific jobs were tied to specific servers because of installed dependencies, network access, or just historical accident. If that server went down, the job didn’t run. If someone needed to modify it, they had to SSH into that particular box and hope they had the right permissions.

No consistency. Some jobs ran via cron. Some were kicked off manually over SSH. Some used Ansible, some used PowerShell, some were raw bash scripts. There was no standard way to create, schedule, or monitor a job.

Testing was manual and risky. Want to test a change to a scheduled script? You’d modify it on the server, maybe add some debug output, and wait for the next scheduled run. Or you’d run it manually and hope it didn’t step on production data.

Why This Matters

This isn’t just a ā€œmessy infrastructureā€ story. These pain points have real consequences:

  • Compliance gaps. When auditors ask ā€œcan you prove this cleanup job ran successfully every week for the past year?ā€ the answer was a shrug.
  • Operational risk. A single server failure could silently break multiple automated processes.
  • Onboarding friction. New team members had no way to discover what jobs existed, what they did, or how to modify them.
  • Wasted time. Engineers spent hours investigating failures that could have been caught immediately with proper logging and notifications.

The Goal

We needed a solution that would:

  1. Centralize all scheduled and ad-hoc jobs into one place.
  2. Version control every script and schedule definition.
  3. Provide visibility into job history, success, and failure.
  4. Isolate toolchains so PowerShell, Ansible, and Packer jobs wouldn’t conflict with each other.
  5. Stay on-premises because these jobs needed access to internal networks and resources.

The answer turned out to be GitHub Actions with Docker-based self-hosted runners. In Part 2, we’ll walk through the architecture we built and how GitHub’s scheduling capabilities replaced all those scattered jobs.

Happy automating!


Next up: Part 2 — The Solution: Docker Runners, Schedules, and Secrets