A Fourth Cut in Sunder

Last week I read Alex Kladov's Three Different Cuts discussing the implementation of cut in three different languages: Rust, Go, and Zig. The cut function takes a string and a pattern, and splits the string around the first occurrence of the pattern:

cut("life", "if") = ("l", "e")

cut("key=value", "=") = ("key", "value")

cut("hello", "foo") = no result since "hello" does not contain "foo"

The article provides a theoretical signature for cut if it were to be added to Zig's standard library:

pub fn cut(
    s: []const u8,
    sep: []const u8
) ?struct { prefix: []const u8, suffix: []const u8 } {
    ...
}

I really like this function signature; it is short, easy to read, and uses an anonymous struct as a named tuple to clearly describe the bundled data returned from the function. After reading Kladov's article, I knew that I wanted to add cut to Sunder's standard library, and I chose to use the Zig function signature as the basis for my Sunder implementation.

The Sunder version of the cut signature takes the form:

func cut(
    str: []byte,
    separator: []byte
) std::optional[[struct { var prefix: []byte; var suffix: []byte; }]] {
    ...
}

Sunder supports byte slices via the type []byte1, and supports optional values with the user-defined std::optional type. Coincidentally, anonymous structs were added to Sunder on the same day that Kladov's article was published2, so the Sunder implementation can use the same named tuple type seen in the Zig signature.

The Sunder standard library already supported a find function on byte slices, so cut only ended up taking a handful of lines to implement:

func cut(
    str: []byte,
    separator: []byte
) std::optional[[struct { var prefix: []byte; var suffix: []byte; }]] {
    alias T = struct { var prefix: []byte; var suffix: []byte; };

    var index = std::str::find(str, separator);
    if index.is_empty() {
        return std::optional[[T]]::EMPTY;
    }

    var index = index.value();
    return std::optional[[T]]::init_value((:T){
        .prefix = str[0:index],
        .suffix = str[index+countof(separator):countof(str)],
    });
}

This implementation of cut was added to Sunder's standard library as std::str::cut in commit 4b0296c. Shortly after the addition, I was able to replace a bunch of std::str::split and std::str::split_with_allocator calls with equivalent std::str::cut calls across my scratch repo. This instance was particularly satisfying to replace. At the time when the code was originally written, I was frustrated at how inelegant the call to std::str::split_with_allocator was when I knew the result of the split operation fit into fixed size list of two elements. Looking at that code now, it is obvious that the operation was a tool for cut!

Footnotes

1. Sunder does have a u8 type, but u8 and byte are distinct. The two types have identical size and alignment, but arithmetic operators such as addition and subtraction will produce a compile-time error when used on byte values. The type []u8 means "slice of unsigned 8-bit integers" whereas the type []byte means "slice of bytes" or "byte string".

2. Anonymous structs and unions were originally added to Sunder in commits 203d581 and d94fc90 to simplify type definitions, with the inlining of std::_result_union into std::result in commit 8bccfb1 as a motivating example. It was only after reading Three Different Cuts that I realized anonymous structs and unions could also be useful in defining one-off named tuples for function return types.