Josh Choo's blog

How actix-web's application state and Data extractor works internally

24 October 2021

When developing web servers, we sometimes need a mechanism to share application state, such as configurations or database connections. actix-web makes it possible to inject shared data into application state using the app_data method and retrieve and pass the data to route handlers using Data extractor.

Let's explore just enough to understand how it works under the hood!

Note: The examples are based on zero2prod and actix-web v4 source code.

The following is an example of an actix-web server that exposes a /subscriptions route. subscribeHandler accepts a Postgres database connection (PgConnection), which we added to the application state via the app_data method.

pub fn run(listener: TcpListener, connection: PgConnection) -> Result<Server, std::io::Error> {
    let connection = web::Data::new(connection);
    let server = HttpServer::new(move || {
        App::new()
            .route("/subscriptions", web::post().to(subscribeHandler))
			// vvvvv Store PgConnection app data here
            .app_data(connection.clone())
    })
    .listen(listener)?
    .run();
    Ok(server)
}

...

// Extract PgConnection from application state
pub async fn subscribeHandler(_connection: web::Data<PgConnection>) -> HttpResponse {
    //...
}

Question: Since it is possible to store other data types, how does actix-web know to retrieve and pass a PgConnection value to subscribeHandler instead of a different type?

Firstly, let's take a look at the web::post().to(...) implementation:

// in actix-web's src/web.rs

pub fn to<F, I, R>(handler: F) -> Route
where
    F: Handler<I, R>,
    I: FromRequest + 'static, // <--- Notice this!
    R: Future + 'static,
    R::Output: Responder + 'static,
{
    Route::new().to(handler)
}

We could dive deeper into the function calls, but it would get quite complex and unnecessary to know. So instead, take note of the FromRequest trait associated with the Handler.

Next, we shall take a look at web::Data<T>, which subscribeHandler takes as a parameter, and see that it implements the FromRequest trait with the from_request function:

// in actix-web's src/data.rs

pub struct Data<T: ?Sized>(Arc<T>);

impl<T: ?Sized + 'static> FromRequest for Data<T> {
    type Config = ();
    type Error = Error;
    type Future = Ready<Result<Self, Error>>;

    #[inline]
    fn from_request(req: &HttpRequest, _: &mut Payload) -> Self::Future {
        if let Some(st) = req.app_data::<Data<T>>() {
            ok(st.clone())
        } else {
            log::debug!(
                "Failed to construct App-level Data extractor. \
                 Request path: {:?} (type: {})",
                req.path(),
                type_name::<T>(),
            );
            err(ErrorInternalServerError(
                "App data is not configured, to configure use App::data()",
            ))
        }
    }
}

from_request calls the req.app_data::<Data<T>>(), which returns an Option. It returns the data, st, if found. Otherwise it returns an Internal Server Error.

So, how exactly does calling req.app_data::<Data<T>> return the correct PgConnection data that we specified in the generic type parameter?

Let's look at the app_data method implementation:

// in actix-web's src/request.rs

pub struct HttpRequest {
    pub(crate) inner: Rc<HttpRequestInner>,
}

pub(crate) struct HttpRequestInner {
    pub(crate) app_data: SmallVec<[Rc<Extensions>; 4]>,
}

impl HttpRequest {
    pub fn app_data<T: 'static>(&self) -> Option<&T> {
		// container has type `Extensions`
        for container in self.inner.app_data.iter().rev() {
            if let Some(data) = container.get::<T>() {
                return Some(data);
            }
        }

        None
    }
}

It appears that the incoming HttpRequest data contains app_data, which the code iterates through and then calls container.get::<T>().

// in actix-http's src/extensions.rs

pub struct Extensions {
    map: AHashMap<TypeId, Box<dyn Any>>,
}

impl Extensions
    pub fn get<T: 'static>(&self) -> Option<&T> {
        self.map
            .get(&TypeId::of::<T>())
            .and_then(|boxed| boxed.downcast_ref())
    }
}

get::<T>() tries to get the value at the key TypeId::of::<T>() from the hashmap, map. What does TypeId::of::<T>() do?

/// A `TypeId` represents a globally unique identifier for a type.

pub struct TypeId {
    t: u64,
}

impl TypeId {
    /// Returns the `TypeId` of the type this generic function has been
    /// instantiated with.
    ///
    /// # Examples
    ///
    /// ```
    /// use std::any::{Any, TypeId};
    ///
    /// fn is_string<T: ?Sized + Any>(_s: &T) -> bool {
    ///     TypeId::of::<String>() == TypeId::of::<T>()
    /// }
    ///
    /// assert_eq!(is_string(&0), false);
    /// assert_eq!(is_string(&"cookie monster".to_string()), true);
    /// ```
    pub const fn of<T: ?Sized + 'static>() -> TypeId {
        TypeId { t: intrinsics::type_id::<T>() }
    }
}

Aha! Given a type, such as PgConnection, Rust allows us to obtain a unique u64 value that identifies it! Hence calling TypeId::of::<PgConnection>() gives us a value that we can use a key for the hashmap inside of app_data.

But wait a moment! Isn't PgConnection just a generic type parameter? So, at runtime, how does Rust ensure that calling TypeId::of::<PgConnection>() gives us the value associated with PgConnection and not some other type?

Rust generates different copies of generic functions during compilation for each concrete type needed, such as PgConnection. This process is known as monomorphization. Given that we use web::Data<PgConnnection>, Rust will compile copies of the from_request, app_data, get and TypeId::of functions that are specific to the PgConnection type.

At runtime, when a request hits the /subscriptions route, the PgConnection-specific version of the TypeId::of function will return a concrete TypeId value associated with PgConnection. This value is then used to search app_data's internal hashmap and return the stored PgConnection data that we added to the application state.

So, we've figured out how actix-web retrieves PgConnection from the application state. Now, to complete our understanding, let's look at how actix-web stores data in the application state:

// Server code

App::new()
	.route("/subscriptions", web::post().to(subscribeHandler))
	// vvvvv Store PgConnection app data here
	.app_data(connection.clone())

Digging into the app_data method's implementation:

// in actix-web's src/app.rs

impl App {
    pub fn app_data<U: 'static>(mut self, ext: U) -> Self {
        self.extensions.insert(ext);
        self
    }
}

App inserts the PgConnection value into its own extensions, which contains a hashmap (remember from earlier):

// in actix-http's src/extensions.rs

impl Extensions
    pub fn insert<T: 'static>(&mut self, val: T) -> Option<T> {
        self.map
            .insert(TypeId::of::<T>(), Box::new(val))
            .and_then(downcast_owned)
    }
}

Hey! It's the same Extensions type! The method inserts the PgConnection value with the key of TypeId::of::<T>(), which resolves to the PgConnection-specific value.

There we have it! actix-web stores data in application state's by inserting and retrieving them from a hashmap using keys derived from TypeId::of::<T>(), which returns unique identifiers for different types. And this is made possible by monomorphization in Rust!