HTML Linter in Rust

Building a powerful HTML linter in Rust

Introduction

Modern web development teams face a constant challenge: maintaining consistent HTML structure across large applications. While JavaScript and CSS have robust linting ecosystems, HTML often relies on manual review or basic validation. This gap becomes particularly problematic when dealing with accessibility requirements, SEO optimization, and organizational coding standards.

This is where html-linter comes in - a Rust-based library that brings systematic HTML validation to your development workflow. By leveraging Rust's performance characteristics and safety guarantees, it can process complex HTML documents in milliseconds while providing detailed, actionable feedback.

Core Concepts

At its heart, html-linter operates on a rule-based system that combines flexibility with precision. Each rule represents a specific HTML requirement, such as "all images must have alt attributes" or "heading levels must not skip numbers." These rules can target anything from basic attribute presence to complex semantic relationships between elements.

Rules are categorized into types that determine their behavior. For instance, an AttributePresence rule type efficiently checks for required ARIA attributes, while a Semantic rule type can ensure proper HTML5 structural elements are used instead of generic divs. This categorization allows the linter to optimize its parsing strategy based on what it's looking for.

Each rule carries a severity level - Error, Warning, or Info - allowing teams to distinguish between critical accessibility violations and stylistic preferences. This granular control means you can gradually introduce stricter standards to your codebase, starting with critical errors and progressively addressing warnings.

Getting Started

Let's examine a practical example: enforcing alt attributes on images, a crucial accessibility requirement. This example demonstrates how html-linter combines powerful validation with straightforward implementation:

main.rs
    use html_linter::{HtmlLinter, Rule, RuleType, Severity};
use std::collections::HashMap;

fn main() {
    // Define an accessibility rule for image alt attributes
    let rules = vec![
        Rule {
            name: "img-alt".to_string(),
            rule_type: RuleType::AttributePresence,
            severity: Severity::Error,
            selector: "img".to_string(),
            condition: "alt-missing".to_string(),
            message: "Images must have alt attributes".to_string(),
            options: HashMap::new(),
        },
    ];

    let linter = HtmlLinter::new(rules, None);
    let html = r#"<html><body><img src="test.jpg"></body></html>"#;

    match linter.lint(html) {
        Ok(results) => {
            for result in results {
                println!(
                    "Rule: {}, Severity: {:?}, Message: {}",
                    result.rule,
                    result.severity,
                    result.message
                );
            }
        }
        Err(e) => eprintln!("Linter error: {}", e),
    }
}
  

This code demonstrates several key features: CSS-style selectors for targeting elements, clear error messaging, and a type-safe API that catches configuration errors at compile time. The rule definition is explicit and self-documenting, making it easy for team members to understand and modify.

Configuration Flexibility

While programmatic rule definition offers maximum control, many teams need to adjust linting rules without recompiling code. html-linter solves this through its JSON configuration system, which maintains all the type safety of Rust while allowing runtime configuration changes:

rules.json
    {
  "name": "meta-description",
  "rule_type": "ElementContent",
  "severity": "Error",
  "selector": "head",
  "condition": "meta-tags",
  "message": "Meta description must be present and descriptive",
  "options": {
    "required_meta_tags": [
      {
        "name": "description",
        "pattern": {
          "type": "MinLength",
          "value": 50
        },
        "required": true
      }
    ]
  }
}
  

This configuration approach enables teams to version control their HTML standards alongside their code. The JSON schema is strongly typed, providing IDE autocompletion and validation while maintaining the flexibility to adjust rules per project or environment.

Advanced Features

Real-world HTML validation often requires checking multiple conditions simultaneously. html-linter's compound rules system handles these complex scenarios elegantly:

compound-rule.json
    {
  "name": "accessible-button",
  "rule_type": "Compound",
  "severity": "Error",
  "selector": "button",
  "condition": "compound",
  "message": "Button must meet accessibility requirements",
  "options": {
    "check_mode": "all",
    "conditions": [
      {
        "type": "AttributeValue",
        "attribute": "aria-label",
        "pattern": ".+"
      },
      {
        "type": "AttributeValue",
        "attribute": "role",
        "pattern": "button"
      }
    ]
  }
}
  

This compound rule system allows you to express complex accessibility requirements as a single, maintainable rule. The check_mode option provides fine-grained control over how conditions combine, supporting scenarios from "all must match" to "at least one must match."

Performance and Architecture

Performance isn't just a feature - it's a requirement for modern development workflows. When linting runs as part of continuous integration or hot-reload development, every millisecond counts. This drove our choice of Rust as the implementation language, allowing us to build a linter that processes complex HTML documents with negligible overhead.

The architecture leverages several key innovations from the Rust ecosystem. Mozilla's html5ever provides the foundation with its zero-copy parsing architecture, while the selectors crate (also from Mozilla) brings the same CSS selector engine used in Firefox. For parallel processing, Rayon enables automatic parallelization of rule evaluation, scaling efficiently with document size and CPU cores.

These architectural choices pay off in real-world usage: html-linter can process a typical webpage (~100KB) in under 50ms, including rule loading and evaluation. This performance means you can run comprehensive HTML validation as part of your development feedback loop, catching issues before they reach production.

Conclusion

HTML linting is no longer a nice-to-have - it's essential for maintaining quality in modern web applications. html-linter brings the rigor of traditional linting tools to HTML, backed by Rust's performance and safety guarantees. Whether you're enforcing accessibility standards, maintaining consistent markup, or implementing organization-specific HTML rules, html-linter provides the foundation for systematic HTML validation.