Skip to content

DuckDB YAML Extension

DuckDB Community Extension

The YAML extension for DuckDB enables seamless reading, writing, and querying of YAML data directly within SQL queries. It provides native YAML type support, comprehensive extraction functions, and automatic type detection.

Features

  • Read YAML Files


    Read single files, glob patterns, or lists of YAML files directly into DuckDB tables with automatic schema detection.

    Reading YAML

  • Frontmatter Extraction


    Extract YAML frontmatter from Markdown, MDX, Astro, and other text files for blog and documentation analysis.

    Frontmatter

  • Native YAML Type


    Store and manipulate YAML data with a native type that seamlessly converts between YAML, JSON, and VARCHAR.

    YAML Type

  • Query & Extract


    Use path expressions to extract, filter, and transform YAML data with powerful extraction functions.

    Extraction Functions

  • Smart Type Detection


    Automatic detection of dates, times, timestamps, booleans, and optimal numeric types from YAML data.

    Type Detection

  • Write YAML Files


    Export query results to YAML files with customizable formatting using COPY TO statements.

    Writing YAML

Quick Example

-- Load the extension
LOAD yaml;

-- Query YAML files directly
SELECT * FROM 'data/config.yaml';
SELECT * FROM 'data/*.yml' WHERE active = true;

-- Create a table with YAML column
CREATE TABLE configs(id INTEGER, config YAML);

-- Insert YAML data
INSERT INTO configs VALUES
    (1, 'environment: prod\nport: 8080'),
    (2, '{environment: dev, port: 3000}');

-- Query YAML data using extraction functions
SELECT
    id,
    yaml_extract_string(config, '$.environment') AS env,
    yaml_extract(config, '$.port') AS port
FROM configs;

Installation

INSTALL yaml FROM community;
LOAD yaml;
INSTALL yaml FROM 'https://github.com/teaguesterling/duckdb_yaml/releases/download/v0.1.0/yaml.duckdb_extension';
LOAD yaml;
git clone https://github.com/teaguesterling/duckdb_yaml
cd duckdb_yaml
make

AI-Written Extension

Claude.ai wrote 99% of the code in this project as an experiment. The original working version was written over the course of a weekend and refined periodically until a production-ready state was reached.

What's Next?