Example of Markdown Imported.

Back to our Main Markdown Page. This is an example of how our plugin creates WordPress content from real markdown files. The layout, css, anchors are parsed from our 50kb plugin.

The markdown is used as documentation for building a web scraping automation. If you need consultation on that and if this document seems alien language, we are more than welcome to consulting discussions.

The following is all generated from our plugin.

Playwright Multi-Purpose Automation

This document provides comprehensive guidance for using Playwright as an all-purpose automation tool for PDF generation, HTML saving, and Server-Side Rendering (SSR) cache with multi-architecture support.

Architecture Compatibility

⚠️ CRITICAL: All services must support both Apple Silicon (ARM64) and Intel (x86/x64) architectures across Node.js versions.

📖 See: ARCHITECTURE COMPATIBILITY.md for detailed multi-architecture requirements, Docker configurations, and testing procedures.

Node.js Dependencies

Core Requirements

Node.js Version: node:24-slim (Latest LTS – Required)
Playwright: ^1.40.0 (latest stable)
Architecture: Auto-detects and downloads correct Chromium binaries

Core Playwright Service Dependencies

{
  "dependencies": {
    "express": "^4.18.2",
    "bullmq": "^5.58.5",
    "redis": "^5.8.2",
    "playwright": "^1.40.0",
    "mime-types": "^2.1.35",
    "compression": "^1.7.4"
  },
  "devDependencies": {
    "nodemon": "^3.0.0"
  },
  "engines": {
    "node": ">=18.0.0"
  }
}

Installation Commands

# Install Node.js dependencies
npm install

# Install Playwright browsers (architecture auto-detected)
npx playwright install chromium
npx playwright install-deps chromium

Python Dependencies

Core Requirements

Python: 3.11+ recommended
Playwright: 1.40.0 (matches Node.js version)
Architecture: Auto-detects and downloads correct Chromium binaries

Core Python Service Dependencies

flask==2.3.3
playwright==1.40.0
gunicorn==21.2.0
nest-asyncio==1.5.8
redis==4.6.2

Installation Commands

# Install Python dependencies
pip install -r requirements.txt

# Install Playwright browsers (architecture auto-detected)
playwright install chromium

Core Use Cases

Playwright serves as a powerful all-purpose automation tool for:

1. PDF Generation

Web-to-PDF: Convert any webpage or HTML content to high-quality PDFs
Dynamic Content: Handle JavaScript-rendered content and interactive elements
Custom Formatting: Control page size, margins, headers, footers
Batch Processing: Generate multiple PDFs efficiently via queue system

2. HTML Saving & Caching

Complete Pages: Save fully rendered HTML with all assets loaded
Static Site Generation: Pre-render dynamic content for faster delivery
Content Archival: Preserve web content exactly as rendered
Offline Access: Create local copies of web applications

3. Server-Side Rendering (SSR) Cache

Performance Optimization: Pre-render pages for faster initial load
SEO Enhancement: Provide search engines with fully rendered content
Dynamic Caching: Cache complex, data-driven pages efficiently
Progressive Enhancement: Serve static content while JavaScript loads

Architecture Overview

┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐
│   Client Apps   │───▶│  Job Processor  │───▶│  Playwright     │
│   (Web/API)     │    │    (BullMQ)     │    │   Workers       │
└─────────────────┘    └─────────────────┘    └─────────────────┘
                              │                        │
                              ▼                        ▼
                       ┌─────────────────┐    ┌─────────────────┐
                       │  Redis Queue    │    │  File Storage   │
                       │   (Jobs)        │    │  (PDFs/HTML)    │
                       └─────────────────┘    └─────────────────┘

BullMQ Integration

Version Requirements

BullMQ: 5.58.5 (latest stable)
Redis: ^5.8.2 (Official Node.js Redis client – recommended over IORedis)
Redis Server: 7.0+ (server requirement)

Job Processor Implementation

Queue Configuration

import { Queue, Worker } from 'bullmq';
import { createClient } from 'redis';

// Redis connection (Official Redis client)
const connection = createClient({
  socket: {
    host: process.env.REDIS_HOST || 'localhost',
    port: process.env.REDIS_PORT || 6379,
  },
  password: process.env.REDIS_PASSWORD,
  database: process.env.REDIS_DB || 0,
});

// Connect to Redis
await connection.connect();

// Handle connection errors
connection.on('error', (err) => console.error('Redis Client Error', err));
connection.on('connect', () => console.log('Connected to Redis'));

// Create queue
const screenshotQueue = new Queue('screenshot-jobs', { 
  connection,
  defaultJobOptions: {
    removeOnComplete: 100,
    removeOnFail: 50,
    attempts: 3,
    backoff: {
      type: 'exponential',
      delay: 2000,
    },
  },
});

Worker Implementation

// Multi-purpose Playwright worker
const worker = new Worker('playwright-jobs', async (job) => {
  const { type, url, options } = job.data;
  
  // Launch Playwright browser with detected Chromium path
  const browser = await chromium.launch({
    headless: true,
    executablePath: CHROMIUM_PATH,
    args: ['--no-sandbox', '--disable-setuid-sandbox']
  });
  
  try {
    const page = await browser.newPage();
    await page.goto(url, { waitUntil: 'networkidle' });
    
    switch (type) {
      case 'pdf':
        const pdf = await page.pdf({
          format: options.format || 'A4',
          margin: options.margin || { top: '1in', right: '1in', bottom: '1in', left: '1in' },
          printBackground: true
        });
        return { success: true, pdf: pdf.toString('base64') };
        
      case 'html':
        const html = await page.content();
        return { success: true, html, title: await page.title() };
        
      case 'cache':
        const cachedHtml = await page.content();
        // Save to cache storage (Redis, file system, etc.)
        return { success: true, cached: true, key: options.cacheKey };
        
      default:
        throw new Error(`Unknown job type: ${type}`);
    }
  } finally {
    await browser.close();
  }
}, { 
  connection,
  concurrency: process.env.WORKER_CONCURRENCY || 5,
});

Job Types and Examples

// Core job types for all-purpose automation
const JobTypes = {
  PDF_GENERATION: 'pdf',
  HTML_SAVING: 'html', 
  SSR_CACHE: 'cache'
};

// PDF Generation
await playwrightQueue.add(JobTypes.PDF_GENERATION, {
  url: 'https://example.com/report',
  options: {
    format: 'A4',
    margin: { top: '0.5in', right: '0.5in', bottom: '0.5in', left: '0.5in' },
    displayHeaderFooter: true,
    headerTemplate: '<div style="font-size:10px;">Report - <span class="date"></span></div>'
  }
});

// HTML Saving
await playwrightQueue.add(JobTypes.HTML_SAVING, {
  url: 'https://dynamic-app.com/dashboard',
  options: {
    waitFor: 'networkidle',
    includeAssets: true
  }
});

// SSR Cache
await playwrightQueue.add(JobTypes.SSR_CACHE, {
  url: 'https://spa-app.com/product/123',
  options: {
    cacheKey: 'product_123_rendered',
    ttl: 3600 // 1 hour cache
  }
});

Python Integration with BullMQ

# While BullMQ is Node.js-native, Python services can:
# 1. Use Redis directly for job communication
# 2. Use python-rq as alternative
# 3. Communicate via HTTP APIs with Node.js job processor

import redis
import json

# Redis client for job communication (Python redis package)
redis_client = redis.Redis(host='localhost', port=6379, db=0)

# Add job via Redis (compatible with BullMQ format)
def add_job_to_queue(queue_name, job_data):
    job = {
        'data': job_data,
        'opts': {'attempts': 3}
    }
    redis_client.lpush(f"bull:{queue_name}:waiting", json.dumps(job))

Migration from IORedis to Official Redis Client

Why Migrate?

Official Support: The redis package is officially maintained by Redis
Better Security: Fewer vulnerabilities and faster security updates
Active Maintenance: More frequent updates and better long-term support
Smaller Bundle: Reduced dependency footprint

Breaking Changes

// OLD: IORedis
import IORedis from 'ioredis';
const redis = new IORedis({
  host: 'localhost',
  port: 6379,
  maxRetriesPerRequest: 3,
});

// NEW: Official Redis Client
import { createClient } from 'redis';
const redis = createClient({
  socket: {
    host: 'localhost',
    port: 6379,
  },
});
await redis.connect(); // Required explicit connection

Configuration Differences

// IORedis configuration
const ioRedisConfig = {
  host: 'localhost',
  port: 6379,
  password: 'secret',
  db: 0,
  retryDelayOnFailover: 100,
  maxRetriesPerRequest: 3,
};

// Official Redis client configuration
const redisConfig = {
  socket: {
    host: 'localhost',
    port: 6379,
    connectTimeout: 5000,
    commandTimeout: 5000,
  },
  password: 'secret',
  database: 0,
};

BullMQ Connection Update

// Update your existing BullMQ setup
import { Queue, Worker } from 'bullmq';
import { createClient } from 'redis';

// Create shared connection for BullMQ
async function createRedisConnection() {
  const connection = createClient({
    socket: {
      host: process.env.REDIS_HOST || 'localhost',
      port: parseInt(process.env.REDIS_PORT) || 6379,
    },
    password: process.env.REDIS_PASSWORD,
    database: parseInt(process.env.REDIS_DB) || 0,
  });
  
  await connection.connect();
  return connection;
}

// Use with BullMQ
const connection = await createRedisConnection();
const queue = new Queue('jobs', { connection });
const worker = new Worker('jobs', async (job) => {
  // Process job
}, { connection });

Development Setup

Local Development Environment

# 1. Start Redis server
docker run -d --name redis -p 6379:6379 redis:7-alpine

# 2. Install Node.js dependencies
cd job-processor
npm install
npx playwright install chromium

# 3. Install Python dependencies
cd ../scraper-api
pip install -r requirements.txt
playwright install chromium

# 4. Start services
npm run dev  # Node.js services
python app.py  # Python services

Docker Development

# Build multi-architecture images
docker buildx build --platform linux/amd64,linux/arm64 -t job-processor:latest .
docker buildx build --platform linux/amd64,linux/arm64 -t scraper-api:latest .

# Run with docker-compose
docker-compose up -d

Environment Variables

# Redis Configuration
REDIS_HOST=localhost
REDIS_PORT=6379
REDIS_PASSWORD=

# Job Processor
WORKER_CONCURRENCY=5
MAX_RETRIES=3

# Playwright Configuration  
PLAYWRIGHT_BROWSERS_PATH=/ms-playwright
PLAYWRIGHT_CHROMIUM_EXECUTABLE_PATH=/ms-playwright/chromium-*/chrome-linux/chrome

# Architecture Detection
NODE_ENV=production
PLATFORM_ARCH=auto

Performance Considerations

Resource Management

Memory: 2GB minimum per worker process
CPU: Optimal at 2-4 cores per worker
Disk: 1GB for Chromium binaries per architecture

Scaling Guidelines

// Horizontal scaling configuration
const workerConfig = {
  concurrency: Math.max(1, Math.floor(os.cpus().length / 2)),
  maxStalledCount: 1,
  stalledInterval: 30 * 1000,
  maxmemoryPolicy: 'allkeys-lru'
};

Monitoring and Health Checks

// Health check endpoint
app.get('/health', async (req, res) => {
  const queueHealth = await screenshotQueue.getJobCounts();
  const workerHealth = worker.isRunning();
  
  res.json({
    status: 'healthy',
    queue: queueHealth,
    worker: { running: workerHealth },
    architecture: os.arch(),
    nodeVersion: process.version
  });
});

Testing and Validation

Required Test Coverage

✅ Multi-architecture builds (ARM64 + x86/x64)
✅ Node.js version compatibility (18 + 24)
✅ BullMQ job processing reliability
✅ Playwright browser launch consistency
✅ Memory leak prevention
✅ Error handling and retry logic

Integration Testing

# Run comprehensive test suite
npm test -- --coverage
pytest --cov=src tests/

# Architecture-specific testing
docker buildx build --platform linux/arm64 -t test:arm64 .
docker buildx build --platform linux/amd64 -t test:amd64 .

Comprehensive Implementation Examples

PDF Generation Service

// Advanced PDF generation with custom options
async function generatePDF(url, options = {}) {
  const browser = await chromium.launch({
    headless: true,
    executablePath: CHROMIUM_PATH,
    args: ['--no-sandbox', '--disable-setuid-sandbox']
  });
  
  try {
    const page = await browser.newPage();
    
    // Set viewport for consistent rendering
    await page.setViewportSize({ width: 1200, height: 800 });
    
    // Navigate and wait for content
    await page.goto(url, { waitUntil: 'networkidle' });
    
    // Wait for specific elements if needed
    if (options.waitForSelector) {
      await page.waitForSelector(options.waitForSelector);
    }
    
    const pdf = await page.pdf({
      format: options.format || 'A4',
      margin: options.margin || { 
        top: '1in', 
        right: '1in', 
        bottom: '1in', 
        left: '1in' 
      },
      printBackground: true,
      displayHeaderFooter: options.displayHeaderFooter || false,
      headerTemplate: options.headerTemplate || '',
      footerTemplate: options.footerTemplate || '',
      landscape: options.landscape || false,
      scale: options.scale || 1
    });
    
    return pdf;
  } finally {
    await browser.close();
  }
}

// Usage examples
const reportPDF = await generatePDF('https://app.com/report', {
  format: 'A4',
  displayHeaderFooter: true,
  headerTemplate: '<div style="font-size:10px; width:100%; text-align:center;">Monthly Report</div>',
  footerTemplate: '<div style="font-size:10px; width:100%; text-align:center;">Page <span class="pageNumber"></span> of <span class="totalPages"></span></div>'
});

HTML Saving & Archival Service

// Complete HTML saving with assets
async function saveCompleteHTML(url, options = {}) {
  const browser = await chromium.launch({
    headless: true,
    executablePath: CHROMIUM_PATH,
    args: ['--no-sandbox', '--disable-setuid-sandbox']
  });
  
  try {
    const page = await browser.newPage();
    
    // Intercept and save resources if needed
    const resources = [];
    if (options.saveAssets) {
      await page.route('**/*', (route) => {
        const request = route.request();
        resources.push({
          url: request.url(),
          method: request.method(),
          headers: request.headers()
        });
        route.continue();
      });
    }
    
    await page.goto(url, { waitUntil: 'networkidle' });
    
    // Wait for dynamic content
    if (options.waitTime) {
      await page.waitForTimeout(options.waitTime);
    }
    
    const html = await page.content();
    const title = await page.title();
    const screenshot = options.includeScreenshot ? 
      await page.screenshot({ fullPage: true }) : null;
    
    return {
      html,
      title,
      url,
      timestamp: new Date().toISOString(),
      resources: options.saveAssets ? resources : [],
      screenshot: screenshot ? screenshot.toString('base64') : null
    };
  } finally {
    await browser.close();
  }
}

// Usage
const savedPage = await saveCompleteHTML('https://dynamic-app.com/dashboard', {
  saveAssets: true,
  includeScreenshot: true,
  waitTime: 2000
});

SSR Cache Implementation

// Server-Side Rendering cache with Redis
import { createClient } from 'redis';

class SSRCache {
  constructor(redisClient) {
    this.redis = redisClient;
    this.defaultTTL = 3600; // 1 hour
  }
  
  async renderAndCache(url, cacheKey, options = {}) {
    // Check cache first
    const cached = await this.redis.get(cacheKey);
    if (cached && !options.forceRefresh) {
      return JSON.parse(cached);
    }
    
    // Render with Playwright
    const browser = await chromium.launch({
      headless: true,
      executablePath: CHROMIUM_PATH,
      args: ['--no-sandbox', '--disable-setuid-sandbox']
    });
    
    try {
      const page = await browser.newPage();
      
      // Set user agent and viewport for consistent rendering
      await page.setUserAgent('Mozilla/5.0 (compatible; SSRCache/1.0)');
      await page.setViewportSize({ width: 1200, height: 800 });
      
      await page.goto(url, { waitUntil: 'networkidle' });
      
      // Wait for SPA to fully load
      if (options.waitForSelector) {
        await page.waitForSelector(options.waitForSelector);
      }
      
      const html = await page.content();
      const title = await page.title();
      const meta = await page.evaluate(() => {
        const metaTags = {};
        document.querySelectorAll('meta').forEach(meta => {
          if (meta.name) metaTags[meta.name] = meta.content;
          if (meta.property) metaTags[meta.property] = meta.content;
        });
        return metaTags;
      });
      
      const result = {
        html,
        title,
        meta,
        url,
        cached: true,
        timestamp: new Date().toISOString()
      };
      
      // Cache the result
      const ttl = options.ttl || this.defaultTTL;
      await this.redis.setEx(cacheKey, ttl, JSON.stringify(result));
      
      return result;
    } finally {
      await browser.close();
    }
  }
  
  async invalidateCache(pattern) {
    const keys = await this.redis.keys(pattern);
    if (keys.length > 0) {
      await this.redis.del(keys);
    }
    return keys.length;
  }
}

// Usage
const ssrCache = new SSRCache(redisConnection);

// Cache SPA pages for SEO
const cachedPage = await ssrCache.renderAndCache(
  'https://spa-app.com/product/123',
  'product_page_123',
  {
    ttl: 1800, // 30 minutes
    waitForSelector: '.product-details'
  }
);

// Serve cached HTML
app.get('/product/:id', async (req, res) => {
  const cacheKey = `product_page_${req.params.id}`;
  const cached = await ssrCache.renderAndCache(
    `https://spa-app.com/product/${req.params.id}`,
    cacheKey
  );
  
  res.send(cached.html);
});

Critical Path Detection Functions

Both Node.js and Python services include essential functions for detecting the correct Chromium path across architectures. These functions are mandatory for reliable cross-platform operation.

Node.js Implementation

// Smart browser detection for cross-platform compatibility
function detectBestChromiumPath() {
  const arch = os.arch();
  const platform = os.platform();
  
  console.log(`🔍 Detecting browser: platform=${platform}, arch=${arch}`);
  
  // Try Playwright's downloaded browsers first (works best on x64)
  const playwrightPaths = [
    process.env.HOME + '/.cache/ms-playwright/chromium-*/chrome-linux/chrome',
    '/root/.cache/ms-playwright/chromium-*/chrome-linux/chrome',
    process.env.HOME + '/.cache/ms-playwright/chromium_headless_shell-*/chrome-linux/headless_shell',
    '/root/.cache/ms-playwright/chromium_headless_shell-*/chrome-linux/headless_shell'
  ];
  
  // System browsers (fallback, especially for ARM64)
  const systemPaths = [
    '/usr/bin/chromium',
    '/usr/bin/chromium-browser', 
    '/usr/bin/google-chrome',
    '/usr/bin/google-chrome-stable'
  ];
  
  // For x64, prefer Playwright browsers; for ARM64, prefer system browsers
  const pathsToTry = (arch === 'x64' || arch === 'x86_64') 
    ? [...playwrightPaths, ...systemPaths]
    : [...systemPaths, ...playwrightPaths];
  
  for (const pathPattern of pathsToTry) {
    if (pathPattern.includes('*')) {
      // Handle glob patterns for Playwright paths
      try {
        const matches = require('glob').sync(pathPattern);
        for (const match of matches) {
          if (require('fs').existsSync(match)) {
            console.log(`✅ Found Playwright browser: ${match}`);
            return match;
          }
        }
      } catch (error) {
        continue;
      }
    } else {
      if (require('fs').existsSync(pathPattern)) {
        console.log(`✅ Found system browser: ${pathPattern}`);
        return pathPattern;
      }
    }
  }
  
  console.log(`⚠️ No browser found, using default system path: /usr/bin/chromium`);
  return '/usr/bin/chromium';
}

// Usage at startup
const CHROMIUM_PATH = detectBestChromiumPath();

Python Implementation

import os
import glob
import platform
import logging

def detect_best_chromium_path():
    arch = platform.machine().lower()
    system = platform.system().lower()
    
    logging.info(f"🔍 Detecting browser: platform={system}, arch={arch}")
    
    # Try Playwright's downloaded browsers first (works best on x64)
    playwright_paths = [
        os.path.expanduser('~/.cache/ms-playwright/chromium-*/chrome-linux/chrome'),
        '/root/.cache/ms-playwright/chromium-*/chrome-linux/chrome',
        os.path.expanduser('~/.cache/ms-playwright/chromium_headless_shell-*/chrome-linux/headless_shell'),
        '/root/.cache/ms-playwright/chromium_headless_shell-*/chrome-linux/headless_shell'
    ]
    
    # System browsers (fallback, especially for ARM64)
    system_paths = [
        '/usr/bin/chromium',
        '/usr/bin/chromium-browser', 
        '/usr/bin/google-chrome',
        '/usr/bin/google-chrome-stable'
    ]
    
    # For x64, prefer Playwright browsers; for ARM64, prefer system browsers
    paths_to_try = playwright_paths + system_paths if 'x86_64' in arch or 'amd64' in arch else system_paths + playwright_paths
    
    for path_pattern in paths_to_try:
        if '*' in path_pattern:
            # Handle glob patterns for Playwright paths
            try:
                matches = glob.glob(path_pattern)
                for match in matches:
                    if os.path.exists(match) and os.access(match, os.X_OK):
                        logging.info(f"✅ Found Playwright browser: {match}")
                        return match
            except Exception:
                continue
        else:
            if os.path.exists(path_pattern) and os.access(path_pattern, os.X_OK):
                logging.info(f"✅ Found system browser: {path_pattern}")
                return path_pattern
    
    logging.warning(f"⚠️ No browser found, using default system path: /usr/bin/chromium")
    return '/usr/bin/chromium'

# Usage at startup
CHROMIUM_PATH = detect_best_chromium_path()

Integration with Playwright Launch

// Node.js usage
const browser = await chromium.launch({
  headless: true,
  executablePath: CHROMIUM_PATH,
  args: ['--no-sandbox', '--disable-setuid-sandbox']
});

# Python usage
from playwright.sync_api import sync_playwright

with sync_playwright() as p:
    browser = p.chromium.launch(
        headless=True,
        executable_path=CHROMIUM_PATH,
        args=['--no-sandbox', '--disable-setuid-sandbox']
    )

Troubleshooting

Common Issues

Chromium path mismatch: Use detectBestChromiumPath() functions above
Architecture incompatibility: Check Docker TARGETPLATFORM
BullMQ connection errors: Validate Redis connectivity
Memory exhaustion: Adjust worker concurrency
Version conflicts: Ensure consistent Playwright versions

Debugging Commands

# Check Playwright installation
npx playwright --version
playwright --version

# Verify Chromium binary
file $(npx playwright --version | grep chromium | cut -d' ' -f2)

# Test BullMQ connection
node -e "import('redis').then(r => r.createClient().connect().then(c => c.ping()).then(console.log))"

# Architecture verification
uname -m && node -e "console.log(os.arch())" && python -c "import platform; print(platform.machine())"

Next Steps: Review the Architecture Compatibility Guide for detailed implementation requirements and testing procedures.

Example of Markdown Imported.

Playwright Multi-Purpose Automation

📋 Table of Contents

Architecture Compatibility

Node.js Dependencies

Core Requirements

Core Playwright Service Dependencies

Installation Commands

Python Dependencies

Core Requirements

Core Python Service Dependencies

Installation Commands

Core Use Cases

1. PDF Generation

2. HTML Saving & Caching

3. Server-Side Rendering (SSR) Cache

Architecture Overview

BullMQ Integration

Version Requirements

Job Processor Implementation

Queue Configuration

Worker Implementation

Job Types and Examples

Python Integration with BullMQ

Migration from IORedis to Official Redis Client

Why Migrate?

Breaking Changes

Configuration Differences

BullMQ Connection Update

Development Setup

Local Development Environment

Docker Development

Environment Variables

Performance Considerations

Resource Management

Scaling Guidelines

Monitoring and Health Checks

Testing and Validation

Required Test Coverage

Integration Testing

Comprehensive Implementation Examples

PDF Generation Service

HTML Saving & Archival Service

SSR Cache Implementation

Critical Path Detection Functions

Node.js Implementation

Python Implementation

Integration with Playwright Launch

Troubleshooting

Common Issues

Debugging Commands

Leave a Reply Cancel reply