Introduction
Recently I have been experimenting with template metaprogramming in Rust, to write a deserializer for key=value pairs into structs. I usually find Rust documentation awesome, so I was surprised when I was not able to find detailed documentation on this topic. The example in The Rust Programming Language book is very basic and only explains how to start. Then there is The Little Book of Rust Macros that has more info, but again I could not find clear examples on writing a custom deserializer. Searching the net provided some examples, but they were not clear enough for me, and usually used old versions of the proc-macro and quote crates that did not work with recent versions. So I decided to write the deserializer to figure out the puzzle.
The key=value deserializer
num=42
txt=Hello
choice=OneDoesNotMatch
choice2=Two
enum TestEnum {
Def,
One,
Two,
}
struct Test {
num: u32,
num2: u8,
txt: String,
txt2: String,
choice: TestEnum,
choice2: TestEnum,
}
Test {
num: 42,
num2: 0,
txt: "Hello",
txt2: "",
choice: Def,
choice2: Two,
}
The kv-extract crate
pub trait KvExtract {
fn kv_extract(input: &str) -> Self;
}
The kv-extract-derive crate
- We use proc_macro to assist in the code derivation process when the user writes the #[derive(KvAssign)] sentence over a struct.
- Using syn crate, we generate the compiler abstract syntax tree (AST) corresponding to the struct we want to derive the deserializer code for.
- We iterate over the AST data to extract the tokens useful for code generation (in this case, the struct name and the struct members.
- Finally we use the qoute! macro to generate code tokens that use the data extracted from the AST to build the derived sources.
Extracting the abstract syntax tree
use proc_macro;
use quote::quote;
use syn::{ Data, Field, Fields, punctuated::Punctuated, token::Comma };
const PARSE_ERR_MSG: &str = "#[derive(KvExtract)]: struct parsing failed. Is this a struct?";
#[proc_macro_derive(KvExtract)]
pub fn kv_extract_derive(input: proc_macro::TokenStream) -> proc_macro::TokenStream {
let ast = syn::parse(input).unwrap();
impl_kv_extract(&ast)
}
- The name of the struct we are deriving.
- A vector with the data for each structure field.
I found no documentation about the AST and how to traverse it. If you know where to find it, please let me know. Fortunately I was able to figure out the puzzle by printing the AST debug info (this requires using full features on syn crate). First we get a reference to the structure name stored in ast.ident. To reach to the structure fields we have first to destructure ast.data as data_struct. Then destructure data_struct.fields as fields_named, and then we have the fields in fields_named.named, that we asign to fields variable through a reference:
fn impl_kv_extract(ast: &syn::DeriveInput) -> proc_macro::TokenStream {
let name = &ast.ident;
let fields = if let Data::Struct(ref data_struct) = ast.data {
if let Fields::Named(ref fields_named) = data_struct.fields {
&fields_named.named
} else {
panic!(PARSE_ERR_MSG);
}
} else {
panic!(PARSE_ERR_MSG);
};
// [...]
Generating code
Now the fun is about to begin, we have to start generating code. The general idea behind code generation in the form of a proc_macro::TokenStream, is to enclose the code snippets we want to build inside a quote! macro. This macro will help us with two tasks:
- Convert the enclosed code snippets into a proc_macro2::TokenStream.
- Expand tokens (related to the fields we just collected) to build code.
Note that this macro returns proc_macro2::TokenStream type, that is different from proc_macro::TokenStream, but converting from the first one to the later is as simple as invoking the into() method.
We expand the code using two quote! blocks. One of them we will see later, is run once for each struct member to initialize it: each resulting code snippet in the form of a proc_macro2::TokenStream is added to a vector. The result is returned by the kv_tokens() function into the tokens variable. The second quote! block generates the skeleton of the derived code, and expands the tokens variable to complete this skeleton. The resulting derived code is returned as a proc_macro::TokenStream using the into() method:
let tokens = kv_tokens(fields);
let gen = quote! {
fn kv_split(text: &str) -> Vec<(String, String)> {
text.split("\n")
.map(|line| line.splitn(2, "=").collect::<Vec<_>>())
.filter(|elem| elem.len() == 2)
.map(|elem| (elem[0].to_string(), elem[1].replace("\"", "")))
.collect()
}
impl KvExtract for #name {
fn kv_extract(input: &str) -> #name {
let kv_in = kv_split(input);
let mut result = #name::default();
#(#tokens)*
result
}
}
};
gen.into()
}
In the code above, inside the quote! block, we generate the kv_split() function, that returns a vector of tuples in the form of (key, value) pairs, obtained from the input string. Then we generate the implementation of the KvExtract trait for the structure (referenced using #name).
The trait implementation first obtains the key-value pairs from the input string and then creates the result variable with default values (so we will need every member of the struct to implement the Default trait or code will not compile). Then we expand the tokens vector with the code assigning the struct members using the #(#tokens)* syntax, to finally return the result.
The only thing we are still missing is how the tokens vector is generated in the kv_tokens() function:
fn kv_tokens(fields: &Punctuated<Field, Comma>) -> Vec<proc_macro2::TokenStream> {
let mut tokens = Vec::new();
for field in fields {
let member = &field.ident;
tokens.push(
quote! {
kv_in.iter().filter(|(key, _)| key == stringify!(#member))
.take(1)
.for_each(|(_, value)| {
if let Ok(data) = value.parse() {
result.#member = data;
}
});
});
}
tokens
}
The code above, adds a block of code in the form of proc_macro2::TokenStream to the tokens vector for each struct member. This code block takes a specific struct member and iterates the key-value tuples obtained from the input string, to see if any of them matches. When a key matches, its corresponding value is converted using the parse() string method and assigned to the specific struct member that matched. The parse() method requires the FromStr trait to be implemented for the returned datatype, so if we use custom enums or structs as struct members, we will have to implement the trait ourselves (in addition to the Default one as explained earlier). But if we place inside the struct a type already implementing the Default and FromStr traits (for example the MacAddress struct from eui48 crate), it will be beautifully deserialized without us having to write a single new line of code. Nice!
Testing the derive macro
The only thing remaining is to test this works with the following program:
use kv_extract::KvExtract;
use kv_extract_derive::KvExtract;
use std::str::FromStr;
#[derive(Debug)]
enum TestEnum {
Def,
One,
Two,
}
impl Default for TestEnum {
fn default() -> TestEnum {
TestEnum::Def
}
}
impl FromStr for TestEnum {
type Err = String;
fn from_str(s: &str) -> Result<Self, Self::Err> {
match s {
"Def" => Ok(TestEnum::Def),
"One" => Ok(TestEnum::One),
"Two" => Ok(TestEnum::Two),
unknown => Err(format!("\"{}\" does not match TestEnum", unknown))
}
}
}
#[derive(KvExtract, Default, Debug)]
struct Test {
num: u32,
num2: u8,
txt: String,
txt2: String,
choice: TestEnum,
choice2: TestEnum,
}
fn main() {
let data = "num=42\n\
txt=Hello\n\
choice=OneDoesNotMatch\n\
choice2=Two\n\
\n";
println!("{:#?}", Test::kv_extract(data));
}
We had to provide our own implementations of the FromStr and Default traits, but everythings works great, execution outputs the exact same data we expected.
I hope you enjoyed this entry. I had a hard time writing this because I found documentation a bit scarce, but sure derive macros are powerful and beautiful!
No comments:
Post a Comment