Every week, I was spending hours doing the same thing: write a post, format it, upload images, add tags, hit publish. Then I'd inevitably forget a tag, leave a broken <hr /> in the HTML, or upload the wrong image slug. Content publishing had become a bottleneck.

So I automated it. Here's exactly what I built, the tools I used, the data on what changed, and the lessons learned along the way.

The Problem

Running a tech blog in 2026 means publishing multiple times per week. Manual publishing through a CMS dashboard is fine for occasional posts, but at scale it breaks down:

Formatting drift: Every post needs consistent HTML — no stray <h1> tags (Ghost already renders the title), no broken <hr /> from --- separators, proper <table> structure instead of raw markdown tables that Ghost fails to parse
Image pipeline: Generate a feature image, upload it, link it into the post, verify it loads — six steps that are easy to skip or get wrong
Tag consistency: Four to five tags per post, each needing the correct database ID. One wrong ID and the post goes live untagged
Validation gaps: Manual review misses subtle bugs — a stray > character that renders as a visible element, a code block that didn't convert, a duplicate H1 from the post title

8.0 / 10

Build Log Review 2026

🛡️ AI Tool · Updated 2026

Manual QA for a single post was taking 15-20 minutes. Multiply that by 4-5 posts per week and you're losing over an hour to rote validation work.

What I Built

The pipeline has four stages, each running as independent scripted steps:

Stage 1: Content Creation

Research and outline in markdown
Write structured content following editorial templates (TL;DR, pros/cons table, FAQ, comparison tables)
Store posts as .md files in a posts/ directory with standardized frontmatter-lite conventions

Stage 2: HTML Conversion

A custom Python converter (md-to-html.py) that handles:
Headings (## → <h2>, ### → <h3>) — explicitly skipping H1 since Ghost renders it
Code blocks (` → <pre><code class="language-xxx">)
Tables (| col | col | → <table><thead><tr><th>)
Bullet lists (- item → <ul><li>)
Bold/italic/inline code (bold → <strong>, ` code → <code>`)
Raw HTML pass-through for FAQ sections (H3 tags for GEO optimization)
Critical: Skip --- separators entirely — no <hr /> generation
Pre-publish validation: automated grep checks for H1, <hr />, stray >, unconverted lists, broken table artifacts

Stage 3: Image Generation and Upload

AI-generated 16:9 feature image via image generation
Copy to Ghost content directory
Upload via Ghost Admin API (JWT-authenticated multipart upload)
Verify HTTP 200 on the image URL

Stage 4: Database Insertion and Restart

Direct SQLite insertion into Ghost's ghost.db — bypassing the Admin API's unreliable Lexical JSON format
Insert post record with full rendered HTML (not markdown), feature image path, tags, timestamps
Insert post-tag relationships with correct sort order
Restart Ghost via systemd

Tools Used

Tool	Version	Role
Ghost CMS	6.35.0	Publishing platform, running on localhost:2369
SQLite3	3.x	Database backend — direct post insertion into ghost.db
Python	3.14.4	HTML converter, DB insert scripts
Node.js	25.9.0	Image uploader via Ghost Admin API
Gemini Flash	Preview	Feature image generation (16:9)
OpenClaw	Latest	Orchestration, research, and content generation
Cloudflare Tunnel	Latest	Secure tunnel from toolbrain.net → Ghost (port 2369)
systemd	User service	Ghost service management

Data and Results

Over the course of 10 posts published through this pipeline:

Metric	Before (Manual)	After (Automated)	Improvement
Time to format and publish one post	25-35 min	5-8 min	4x faster
Image pipeline (generate + upload + verify)	8-10 min	2-3 min	3x faster
Validation errors caught before publish	0 (caught after)	3-5 per post	N/A — manual had zero pre-publish checks
Posts requiring re-publish to fix errors	6 out of 10	0 out of 10	Eliminated
Time spent on post-publish fixes per week	45-60 min	0 min	Eliminated
Tag consistency (correct ID every time)	~80%	100%	20% improvement

The most impactful metric isn't speed — it's reliability. Before automation, roughly 60% of posts had at least one formatting or tagging error that required a fix after publishing. After implementing pre-publish validation, the error rate dropped to zero.

Lessons Learned

1. Raw HTML beats Lexical JSON for reliability

Ghost stores posts in a lexical column using a proprietary JSON format. The Admin API expects this format and silently corrupts posts sent as raw markdown. The most reliable path is to generate clean HTML, store it in the html column, and set lexical and mobiledoc to NULL. Ghost falls back to HTML rendering when Lexical data is absent.

2. Pre-publish validation is non-negotiable

The automated checks I added — grepping for <h1 (which Ghost renders from the title, creating a duplicate), <hr /> (from --- separators), <p>- (unconverted bullet lists), and stray > characters — caught errors on every single post in the first batch. Without them, every post would have gone live with visible rendering bugs.

3. Ghost needs a restart after every DB change

Ghost aggressively caches its database reads. After inserting a post directly via SQLite, the Ghost process must be restarted via systemctl --user restart ghost-toolbrain.service before the new post appears. This adds ~15 seconds to the pipeline but is unavoidable with direct database access.

4. Feature image paths are relative

The feature_image column stores paths relative to the Ghost content root (/content/images/2026/05/slug.jpg), not absolute filesystem paths. Using absolute paths breaks image rendering.

5. Tag sort order matters

Ghost's posts_tags table has a sort_order column. Tags with sort_order 0 appear first in the post display. If all tags have sort_order 0, Ghost appears to order them by internal ID, which may not match your preferred hierarchy. Explicitly setting sort_order (0 for primary tag, 1, 2, 3... for supporting tags) ensures consistent display order.

6. The Admin API JWT approach is fragile

Ghost Admin API authentication uses short-lived JWTs (5 minutes). The key rotation, audience validation, and algorithm requirements make it unreliable for the image upload step. A working implementation exists but fails silently on edge cases — always verify the HTTP response and image URL after upload.

What's Next

The current pipeline still requires manual research and writing. The next iteration will:

Add automated SEO meta-description generation from post content
Implement scheduled publishing (set published_at to future timestamps with status: 'draft')
Add batch image processing for gallery-style posts
Build a simple CLI wrapper that runs the entire pipeline with one command

This build log is itself published through the pipeline described here — including the image, tags, and validation checks.

Frequently Asked Questions

Why not use the Ghost Admin API for everything?

The Admin API expects posts in Lexical JSON format, not raw HTML or markdown. Converting to Lexical is complex and fragile. Direct SQLite insertion with rendered HTML is more reliable and gives full control over the data.

Is direct SQLite access safe in production?

Ghost uses SQLite as its default database for single-server setups. Direct writes are safe as long as you use transactions and Ghost is not actively writing to the same table. Always restart Ghost after changes.

Can this pipeline work with MySQL instead of SQLite?

Yes. Ghost supports MySQL in production. The SQL schema is the same — just use a MySQL client instead of SQLite3 for the insert operations. The HTML generation and validation steps are database-agnostic.

How are feature images stored and served?

Images are stored in content/images/2026/05/ and served by Ghost's static file handler. They must be uploaded via the Admin API (not just copied to disk) for Ghost to register them in its image cache and serve them correctly.

← Back to all posts

Build Log: Building an AI-Powered Blog Automation Pipeline with Ghost and Python