website/content/blog/marshalling-python-dataclasses.md
2024-01-20 23:36:05 -05:00

2.9 KiB

title date draft tags math medium_enabled
Marshalling Python Dataclasses 2024-01-20T22:52:50-05:00 false
Python
false false

Recently I wanted a way to transfer structured messages between two python applications over a unix domain socket. The cleanest and simplest way I found so far is to make use of the dataclasses and json standard libraries.

We'll consider the following message for the rest of the post:

from dataclasses import dataclass

@dataclass
class QueryUserMessage:
    auth_key: str
    username: str

Marshalling

Let's say we have a message we want to send:

message = QueryUserMessage("lkajdfsas", "brozek")

We first need to get its dictionary representation. Luckily the standard library has us there:

from dataclasses import asdict

message_dict = asdict(message)

Then we can use the json module to give us a string representation

import json

message_str = json.dumps(message_dict)

Finally, we can encode it into bytes and send it away:

# Default encoding is "utf-8"
message_bytes = message_str.encode()
# Assuming connetion is defined...
connection.sendall(message_bytes)

To make this easier for myself, I create a custom json encoder and a function that uses the connection to send off the message

class DataclassEncoder(json.JSONEncoder):
    def default(self, o):
        return asdict(o)
def send_message(connection, message_dataclass):
    contents = json.dumps(message_dataclass, cls=DataclassEncoder).encode()
    connection.sendall(contents)

Un-marshalling

On the other end, let us receive the bytes and decode it into a string:

MESSAGE_BUFFER_LEN = 1024
message_bytes = connection.recv(MESSAGE_BUFFER_LEN)
message_str = message_bytes.decode()

We can use the json module to turn it into a Python dictionary

message_dict = json.loads(message_str)

In this post, we can make use of the fact that we only have one message class. In other cases, you would either want to rely on some protocol or pass in the message type ahead of time. Therefore, we can pass the fields of the dictionary straight to the constructor.

message = QueryUserMessage(**message_dict)

Conclusion

In production use cases, we'll need to introduce a gambit of error-handling to capture failures in json de-serialization and class instantiation. I hope, however, that this serves as a good starting point.

Some things to consider:

  1. If you have multiple types of messages, maybe including a field in the dictionary that is a string which represents the message type. Both applications can then maintain a map between these strings and the class constructors.
  2. If it's possible to have messages larger than the buffer length, then consider either setting it higher or sending the size of the message beforehand.
  3. Using a standard HTTP library ;)