Sunday, January 31, 2021

Writing a custom trait deriver in Rust

Introduction

Recently I have been experimenting with template metaprogramming in Rust, to write a deserializer for key=value pairs into structs. I usually find Rust documentation awesome, so I was surprised when I was not able to find detailed documentation on this topic. The example in The Rust Programming Language book is very basic and only explains how to start. Then there is The Little Book of Rust Macros that has more info, but again I could not find clear examples on writing a custom deserializer. Searching the net provided some examples, but they were not clear enough for me, and usually used old versions of the proc-macro and quote crates that did not work with recent versions. So I decided to write the deserializer to figure out the puzzle.


 

The key=value deserializer

I am working on a project that needs to interface wpa_supplicant. There is a wpactrl crate that handles the connection to the daemon and allows making requests and getting the results. But this crate does no parsing of the results output by wpa_supplicant, that are provided in a string with key=value formatting, one pair per line. I could have parsed every line, matching the keys I want to obtain, to assign then the corresponding struct member, but this looked like the perfect opportunity to write an automatic deserializer. So you just write the struct with the fields you want to obtain, tell the compiler to derive the key-value extract code, and profit. Before I start, you can check in GitLab the code I wrote.

So, what I want is to take the following string:
num=42
txt=Hello
choice=OneDoesNotMatch
choice2=Two
And automatically derive the code that parses it and writes the corresponding values to this structure (also containing an enum):
enum TestEnum {
    Def,
    One,
    Two,
}

struct Test {
    num: u32,
    num2: u8,
    txt: String,
    txt2: String,
    choice: TestEnum,
    choice2: TestEnum,
}
The keys available in the string but not available in the struct shall be ignored, and the members of the struct not available in the input string must be filled with default values. Execution of the test program must output this after filling the struct Test:
Test {
    num: 42,
    num2: 0,
    txt: "Hello",
    txt2: "",
    choice: Def,
    choice2: Two,
}

The kv-extract crate 

We already know what we want to achieve, let's get to work! I will not be explaining how to setup the Cargo.toml files, sure you are familiar with them, and the example I linked above from the Rust book explains this perfect, so if you have problems with Cargo, please read the example and check the complete code in my GitLab repository.
 
First we have to create the crate for the key-value deserializer. I have named it kv-extract. This crate just defines the trait with the function to deserialize data, taking the input string and returning the filled structure:
pub trait KvExtract {
    fn kv_extract(input: &str) -> Self;
}
For technical reasons, Rust 2018 requires the code implementing the derive macros to be located at its own crate (this restriction might be lifted in the future), so we have finished with the kv-extract, that was a short crate!

The kv-extract-derive crate

Time to create the kv-extract-derive crate with the derive macro code. The convention is creating derive macros inside the crate we are deriving code for (so we create kv-extract-derive inside the kv-extract crate).

For the derive code, the basic program flow is as follows:
  1. We use proc_macro to assist in the code derivation process when the user writes the #[derive(KvAssign)] sentence over a struct.
  2. Using syn crate, we generate the compiler abstract syntax tree (AST) corresponding to the struct we want to derive the deserializer code for.
  3. We iterate over the AST data to extract the tokens useful for code generation (in this case, the struct name and the struct members.
  4. Finally we use the qoute! macro to generate code tokens that use the data extracted from the AST to build the derived sources.

Extracting the abstract syntax tree

Our entry point in the derive module is defined using the proc_macro_derive macro. The first step is easy, we obtain the AST corresponding to the structure, and pass it down to the function implementing the derive code. The function implementing the derive code will have to return it as a proc_macro::TokenStream representation.
use proc_macro;
use quote::quote;
use syn::{ Data, Field, Fields, punctuated::Punctuated, token::Comma };

const PARSE_ERR_MSG: &str = "#[derive(KvExtract)]: struct parsing failed. Is this a struct?";

#[proc_macro_derive(KvExtract)]
pub fn kv_extract_derive(input: proc_macro::TokenStream) -> proc_macro::TokenStream {
    let ast = syn::parse(input).unwrap();

    impl_kv_extract(&ast)
}
The impl_kv_extract function will have to extract the following data:
  1. The name of the struct we are deriving.
  2. A vector with the data for each structure field.

I found no documentation about the AST and how to traverse it. If you know where to find it, please let me know. Fortunately I was able to figure out the puzzle by printing the AST debug info (this requires using full features on syn crate). First we get a reference to the structure name stored in ast.ident. To reach to the structure fields we have first to destructure ast.data as data_struct. Then destructure data_struct.fields as fields_named, and then we have the fields in fields_named.named, that we asign to fields variable through a reference:

fn impl_kv_extract(ast: &syn::DeriveInput) -> proc_macro::TokenStream {
    let name = &ast.ident;

    let fields = if let Data::Struct(ref data_struct) = ast.data {
        if let Fields::Named(ref fields_named) = data_struct.fields {
            &fields_named.named
        } else {
            panic!(PARSE_ERR_MSG);
        }
    } else {
        panic!(PARSE_ERR_MSG);
    };
    // [...]

Generating code

Now the fun is about to begin, we have to start generating code. The general idea behind code generation in the form of a proc_macro::TokenStream, is to enclose the code snippets we want to build inside a quote! macro. This macro will help us with two tasks:

  1. Convert the enclosed code snippets into a proc_macro2::TokenStream.
  2. Expand tokens (related to the fields we just collected) to build code.

Note that this macro returns proc_macro2::TokenStream type, that is different from proc_macro::TokenStream, but converting from the first one to the later is as simple as invoking the into() method.

We expand the code using two quote! blocks. One of them we will see later, is run once for each struct member to initialize it: each resulting code snippet in the form of a proc_macro2::TokenStream is added to a vector. The result is returned by the kv_tokens() function into the tokens variable. The second quote! block generates the skeleton of the derived code, and expands the tokens variable to complete this skeleton. The resulting derived code is returned as a proc_macro::TokenStream using the into() method:

let tokens = kv_tokens(fields);

    let gen = quote! {
        fn kv_split(text: &str) -> Vec<(String, String)> {
            text.split("\n")
                .map(|line| line.splitn(2, "=").collect::<Vec<_>>())
                .filter(|elem| elem.len() == 2)
                .map(|elem| (elem[0].to_string(), elem[1].replace("\"", "")))
                .collect()
        }

        impl KvExtract for #name {
            fn kv_extract(input: &str) -> #name {
                let kv_in = kv_split(input);
                let mut result = #name::default();

                #(#tokens)*

                result
            }
        }
    };
    gen.into()
}

In the code above, inside the quote! block, we generate the kv_split() function, that returns a vector of tuples in the form of (key, value) pairs, obtained from the input string. Then we generate the implementation of the KvExtract trait for the structure (referenced using #name).

The trait implementation first obtains the key-value pairs from the input string and then creates the result variable with default values (so we will need every member of the struct to implement the Default trait or code will not compile). Then we expand the tokens vector with the code assigning the struct members using the #(#tokens)* syntax, to finally return the result.

The only thing we are still missing is how the tokens vector is generated in the kv_tokens() function:

fn kv_tokens(fields: &Punctuated<Field, Comma>) -> Vec<proc_macro2::TokenStream> {
    let mut tokens = Vec::new();

    for field in fields {
        let member = &field.ident;

        tokens.push(
            quote! {
                kv_in.iter().filter(|(key, _)| key == stringify!(#member))
                    .take(1)
                    .for_each(|(_, value)| {
                        if let Ok(data) = value.parse() {
                            result.#member = data;
                        }
                    });
            });
    }

    tokens
}

The code above, adds a block of code in the form of proc_macro2::TokenStream to the tokens vector for each struct member. This code block takes a specific struct member and iterates the key-value tuples obtained from the input string, to see if any of them matches. When a key matches, its corresponding value is converted using the parse() string method and assigned to the specific struct member that matched. The parse() method requires the FromStr trait to be implemented for the returned datatype, so if we use custom enums or structs as struct members, we will have to implement the trait ourselves (in addition to the Default one as explained earlier). But if we place inside the struct a type already implementing the Default and FromStr traits (for example the MacAddress struct from eui48 crate), it will be beautifully deserialized without us having to write a single new line of code. Nice!

Testing the derive macro

The only thing remaining is to test this works with the following program:

use kv_extract::KvExtract;
use kv_extract_derive::KvExtract;
use std::str::FromStr;

#[derive(Debug)]
enum TestEnum {
    Def,
    One,
    Two,
}

impl Default for TestEnum {
    fn default() -> TestEnum {
        TestEnum::Def
    }
}

impl FromStr for TestEnum {
    type Err = String;
    fn from_str(s: &str) -> Result<Self, Self::Err> {
        match s {
            "Def" => Ok(TestEnum::Def),
            "One" => Ok(TestEnum::One),
            "Two" => Ok(TestEnum::Two),
            unknown => Err(format!("\"{}\" does not match TestEnum", unknown))
        }
    }
}

#[derive(KvExtract, Default, Debug)]
struct Test {
    num: u32,
    num2: u8,
    txt: String,
    txt2: String,
    choice: TestEnum,
    choice2: TestEnum,
}

fn main() {
    let data = "num=42\n\
                txt=Hello\n\
                choice=OneDoesNotMatch\n\
                choice2=Two\n\
                \n";

    println!("{:#?}", Test::kv_extract(data));
}

We had to provide our own implementations of the FromStr and Default traits, but everythings works great, execution outputs the exact same data we expected.

I hope you enjoyed this entry. I had a hard time writing this because I found documentation a bit scarce, but sure derive macros are powerful and beautiful!