README.md
htmlentities.zig
The bundled entities.json is sourced from https://www.w3.org/TR/html5/entities.json.
Modelled on Philip Jacksonβs entities crate for Rust.
Overview
The core datatypes are:
pub const Entity = struct {
entity: []u8,
codepoints: Codepoints,
characters: []u8,
};
pub const Codepoints = union(enum) {
Single: u32,
Double: [2]u32,
};
The list of entities is directly exposed, as well as a binary search function:
pub const ENTITIES: [_]Entity
pub fn lookup(entity: []const u8) ?Entity
Serving suggestion
Add it to your build.zig.zon:
zig fetch --save https://github.com/kivikakk/htmlentities.zig/archive/bd5d569a245c7c8e83812eadcb5761b7ba76ef04.tar.gz
In your build.zig:
const htmlentities_dep = b.dependency("htmlentities.zig", .{ .target = target, .optimize = optimize });
exe.root_module.addImport("htmlentities", htmlentities_dep.module("htmlentities"));
In your main.zig:
const std = @import("std");
const htmlentities = @import("htmlentities");
pub fn main() !void {
var eacute = htmlentities.lookup("é").?;
std.debug.print("eacute: {}\n", .{eacute});
}
Output:
eacute: Entity{ .entity = é, .codepoints = Codepoints{ .Single = 233 }, .characters = Γ© }
Help wanted
Ideally weβd do the JSON parsing and struct creation at comptime. The std JSON
tokeniser uses ~80GB of RAM and millions of backtracks to handle the whole
entities.json at comptime, so itβs not gonna happen yet. Maybe once we get a
comptime allocator we can use the regular parser.
As it is, we do codegen. Ideally weβd piece together an AST and render that
instead of just writing Zig directly β I did try it with a βtemplateβ input
string (see some broken wip at
63b9393), but
itβs hard to do since std.zig.render expects all tokens, including string
literal, to be available in the originally parsed source. At the moment we
parse our generated source and format it so we can at least validate it
syntactically in the build step.