rust serde deserialization of an enum variant

Intro

For a program I'm working on I have this datastructure:

pub enum State {
    None,
    Expansion,
    War,
    CivilWar,
    ...
}

This same datastructure is returned from different external JSON API's where the formatting is slightly different. I'm using serde and serde_json for deserialization. Without any special processing the following program will deserialize "CivilWar" to State::CivilWar:

#[macro_use]
extern crate serde_derive;
extern crate serde_json;

#[derive(Debug, Deserialize)]
pub enum State {
    None,
    Expansion,
    War,
    CivilWar,
    ...
}

fn main() {
    let s = r#" "CivilWar" "#;
    let c:State = serde_json::from_str(s).unwrap();
    println!("input: {} output: State::{:?}", s, c);
}

This will output: input: "CivilWar" output: State::CivilWar.

Lowercase

The JSON format I'm deserialiazing from actually specifies the state as lowercase. This is easily accomodated by adding an annotation #[serde(rename_all = "lowercase")] to the enum:

#[derive(Debug, Deserialize)]
#[serde(rename_all = "lowercase")]
pub enum State {
    None,
    Expansion,
    War,
    CivilWar,
    ...
}

Now "civilwar" will be deserialized as State::CivilWar. Of course "CivilWar" won't deserialize anymore.

Space

However some files contain "civil war" with a space in between. This will still not be mapped correctly. As we have multiple possible inputs, a simple rename will no longer suffice.

A custom implementation of Deserialize works, but is a lot of boilerplate code:

#[derive(Debug)]
pub enum State {
    None,
    Expansion,
    War,
    CivilWar,
    ...
}

impl<'de> Deserialize<'de> for State {
    fn deserialize<D>(deserializer: D) -> Result<Self, D::Error>
    where
        D: Deserializer<'de>,
    {
        let s = String::deserialize(deserializer)?.to_lowercase();
        let state = match s.as_str() {
            "none" => State::None,
            "expansion" => State::Expansion,
            "war" => State::War,
            "civil war" | "civilwar" => State::CivilWar,
            ...
            other => { return Err(de::Error::custom(format!("Invalid state '{}'", other))); },
        };
        Ok(state)
    }
}

Variant deserialize_with

In principle it should be possible to make a custom deserialization function only for the offending variants (State::CivilWar and State::CivilUnrest) by introducing a variant annotation like this:

#[derive(Debug, Deserialize)]
#[serde(rename_all = "lowercase")]
pub enum State {
    None,
    Expansion,
    War,
    #[serde(deserialize_with = "de_civilwar")]
    CivilWar,
    Election,
    Boom,
    Bust,
    CivilUnrest,
    Famine,
    Outbreak,
    Lockdown,
    Investment,
    Retreat,
}

fn de_civilwar<'de, D>(deserializer:D)-> Result<(), D::Error>
    where D: Deserializer<'de> {
    let s = String::deserialize(deserializer)?.to_lowercase();
    println!("found: {}", s);
    if s.as_str() == "civilwar" || s.as_str() == "civil war" {
        Ok(())
    } else {
        Err(
            de::Error::invalid_value(
                Unexpected::Str(&s),
                &r#""civil war" or "civilwar""#
            )
        )
    }
}

However using this fails with an error: invalid type: unit variant, expected newtype variant. At this point it is unclear to my why this doesn't work as it matches the documentation. To narrow it down I implemented a variant of the problem based on the test contained in serde:

#[macro_use]
extern crate serde_derive;
extern crate serde_json;
extern crate serde;

use serde::de::{self, Deserialize, Deserializer, Unexpected};

#[derive(Debug, PartialEq, Serialize, Deserialize)]
enum WithVariant {
    #[serde(deserialize_with = "deserialize_u8_as_unit_variant")]
    Unit,
}

fn deserialize_u8_as_unit_variant<'de, D>(deserializer: D) -> Result<(), D::Error>
where
    D: Deserializer<'de>,
{
    let n = u8::deserialize(deserializer)?;
    if n == 0 {
        Ok(())
    } else {
       Err(de::Error::invalid_value(Unexpected::Unsigned(n as u64), &"0"))
    }
}

fn main() {
    let s1 = "0";
    let i:u8 = serde_json::from_str(s1).unwrap();
    println!("i: {}", i);

    
    let s = "0";
    let c:WithVariant = serde_json::from_str(s).unwrap();
    println!("input: {} output: {:?}", s, c);
}

This fails in a different way, with the error: ExpectedSomeValue, line: 1, column: 1.

Either I'm overlooking something or there is a bug in the libraries.

Update

After some help from David Tolnay one of authors of serde, it turns out that the enum variant deserialize_with feature is meant to be used in a different way.

For the example above from the testcase this works:

    let s = r#"{ "Unit": 0 }"#;
    let c:WithVariant = serde_json::from_str(s).unwrap();
    println!("input: {} output: {:?}", s, c);

meaning the variant needs to be contained in another structure.

Finally David offered the following elegant alternative:

use serde::de::{Deserialize, Deserializer, IntoDeserializer};

#[derive(Debug, Deserialize)]
#[serde(rename_all = "lowercase")]
#[serde(remote = "State")]
pub enum State {
    Expansion,
    CivilWar,
    /* ... */
}

impl<'de> Deserialize<'de> for State {
    fn deserialize<D>(deserializer: D) -> Result<Self, D::Error>
        where D: Deserializer<'de>
    {
        let s = String::deserialize(deserializer)?;
        if s == "civil war" {
            Ok(State::CivilWar)
        } else {
            State::deserialize(s.into_deserializer())
        }
    }
}

Which provides the special handling but avoids the boilerplate for the common cases.

All the example code used in this blog past can be found here.