website/content/blog/avoid-mutating-set-items-python.md

148 lines
4.5 KiB
Markdown
Raw Permalink Normal View History

2023-05-17 21:14:26 -04:00
---
title: "Avoid Mutating Items within a Set (Python Edition)"
date: 2023-05-17T20:46:33-04:00
draft: false
tags: ["Python"]
math: false
medium_enabled: false
---
Whenever we create a set of objects, there's an implicit assumption that those objects don't change. What happens when we break this implicit contract and mutate items within a set? If negative properties occur, how can we get around this issue? We'll explore this through the lens of set membership and talk about how the hash method and equality method are used within sets.
**Example**
We start by defining a Person class that holds a name
```python
class Person:
def __init__(self, name):
self.name = name
```
In order to put it within a set, we need to define a hash method.
```python
class Person:
def __init__(self, name):
self.name = name
def __hash__(self):
return hash(("Person", self.name))
people = {
Person("Brandon"),
Person("Clare"),
Person("James"),
Person("John")
}
```
The property we'll focus on in this post is set membership. In other words, does the set contain a specified object?
```python
print(Person("Brandon") in people) # False
```
This might seem strange since we can clearly see a `Person("Brandon")` in the set! The problem is that we didn't override the `__eq__` method within the class. Currently it only considers two `Person` objects the same if they share the same *reference*.
To test `Person` equivalence in a more intuitive way, we'll say that two `Person` objects are equivalent if they share the same name.
```python
class Person:
def __init__(self, name):
self.name = name
def __hash__(self):
return hash(("Person", self.name))
def __eq__(self, other):
return isinstance(other, Person) and self.name == other.name
```
```python
print(Person("Brandon") in people) # True
```
Now let's mutate the names so they all start with `Awesome `
```python
for p in people:
p.name = "Awesome " + p.name
```
Let's test for membership again
```python
print(Person("Awesome Brandon") in people) # False
print(Person("Brandon") in people) # False
```
What happened? Why is both `Awesome Brandon` and `Brandon` not in the set? This has to do with the way that the sets keep track of the objects within them.
When an object is inserted, it's placed into bins according to their hash. Therefore, when checking for set membership, we look to see if there are items within that hash bin.
Originally when we added `Person("Brandon")` the following hash is computed:
```python
print(hash(("Person", "Brandon"))) # 8879556234447129293
```
The object `Person("Brandon")` is then placed in bin 8879556234447129293.
When we mutate the object, we don't remove the item from the set and add it back in with a recomputed hash. Instead the mutated object stays within the same hash bin.
So what happens then when we ask if `Person("Awesome Brandon")` is within the set? First, it checks to see if there are any non-empty bins with the same hash.
```python
print(hash(("Person", "Awesome Brandon"))) # 8677222781184413080
```
Our set contains no items within the 8677222781184413080 hash bin. Therefore, the set membership will immediately fail disregarding completely the `__eq__` method we define.
**Aside:** Another implicit contract we have is that for a given hash method, two equivalent objects always produce the same hash. What happens when that's not the case?
```python
import random
class Person:
def __init__(self, name):
self.name = name
def __hash__(self):
# Add randomness!
return hash(("Person", self.name, random.randint(0, 5)))
def __eq__(self, other):
return isinstance(other, Person) and self.name == other.name
def __str__(self):
return f"Person(name={self.name})"
people = {
Person("Brandon"),
Person("Clare"),
Person("James"),
Person("John")
}
print(Person("Brandon") in people) # False
```
Again, the set membership returns false disregarding our `__eq__` method since the `__hash__` outputs don't match.
**Workaround:** If we can't mutate the items in a set directly, what should we do instead? The solution here is to remove the old items and insert newly created items with the modified data.
```python
people = {
Person("Brandon"),
Person("Clare"),
Person("James"),
Person("John")
}
to_remove = set()
to_add = set()
for p in people:
to_remove.add(p)
to_add.add(Person("Awesome " + p.name))
people = (people - to_remove).union(to_add)
print(Person("Awesome Brandon") in people) # True
```