diff --git a/content/blog/avoid-mutating-set-items-python.md b/content/blog/avoid-mutating-set-items-python.md new file mode 100644 index 0000000..cade3df --- /dev/null +++ b/content/blog/avoid-mutating-set-items-python.md @@ -0,0 +1,147 @@ +--- +title: "Avoid Mutating Items within a Set (Python Edition)" +date: 2023-05-17T20:46:33-04:00 +draft: false +tags: ["Python"] +math: false +medium_enabled: false +--- + +Whenever we create a set of objects, there's an implicit assumption that those objects don't change. What happens when we break this implicit contract and mutate items within a set? If negative properties occur, how can we get around this issue? We'll explore this through the lens of set membership and talk about how the hash method and equality method are used within sets. + +**Example** + +We start by defining a Person class that holds a name + +```python +class Person: + def __init__(self, name): + self.name = name +``` + +In order to put it within a set, we need to define a hash method. + +```python +class Person: + def __init__(self, name): + self.name = name + def __hash__(self): + return hash(("Person", self.name)) + +people = { + Person("Brandon"), + Person("Clare"), + Person("James"), + Person("John") +} +``` + +The property we'll focus on in this post is set membership. In other words, does the set contain a specified object? + +```python +print(Person("Brandon") in people) # False +``` + +This might seem strange since we can clearly see a `Person("Brandon")` in the set! The problem is that we didn't override the `__eq__` method within the class. Currently it only considers two `Person` objects the same if they share the same *reference*. + +To test `Person` equivalence in a more intuitive way, we'll say that two `Person` objects are equivalent if they share the same name. + +```python +class Person: + def __init__(self, name): + self.name = name + def __hash__(self): + return hash(("Person", self.name)) + def __eq__(self, other): + return isinstance(other, Person) and self.name == other.name +``` + +```python +print(Person("Brandon") in people) # True +``` + +Now let's mutate the names so they all start with `Awesome ` + +```python +for p in people: + p.name = "Awesome " + p.name +``` + +Let's test for membership again + +```python +print(Person("Awesome Brandon") in people) # False +print(Person("Brandon") in people) # False +``` + +What happened? Why is both `Awesome Brandon` and `Brandon` not in the set? This has to do with the way that the sets keep track of the objects within them. + +When an object is inserted, it's placed into bins according to their hash. Therefore, when checking for set membership, we look to see if there are items within that hash bin. + +Originally when we added `Person("Brandon")` the following hash is computed: + +```python +print(hash(("Person", "Brandon"))) # 8879556234447129293 +``` + +The object `Person("Brandon")` is then placed in bin 8879556234447129293. + +When we mutate the object, we don't remove the item from the set and add it back in with a recomputed hash. Instead the mutated object stays within the same hash bin. + +So what happens then when we ask if `Person("Awesome Brandon")` is within the set? First, it checks to see if there are any non-empty bins with the same hash. + +```python +print(hash(("Person", "Awesome Brandon"))) # 8677222781184413080 +``` + +Our set contains no items within the 8677222781184413080 hash bin. Therefore, the set membership will immediately fail disregarding completely the `__eq__` method we define. + +**Aside:** Another implicit contract we have is that for a given hash method, two equivalent objects always produce the same hash. What happens when that's not the case? + +```python +import random + +class Person: + def __init__(self, name): + self.name = name + def __hash__(self): + # Add randomness! + return hash(("Person", self.name, random.randint(0, 5))) + def __eq__(self, other): + return isinstance(other, Person) and self.name == other.name + def __str__(self): + return f"Person(name={self.name})" + + +people = { + Person("Brandon"), + Person("Clare"), + Person("James"), + Person("John") +} + +print(Person("Brandon") in people) # False +``` + +Again, the set membership returns false disregarding our `__eq__` method since the `__hash__` outputs don't match. + +**Workaround:** If we can't mutate the items in a set directly, what should we do instead? The solution here is to remove the old items and insert newly created items with the modified data. + +```python +people = { + Person("Brandon"), + Person("Clare"), + Person("James"), + Person("John") +} + +to_remove = set() +to_add = set() +for p in people: + to_remove.add(p) + to_add.add(Person("Awesome " + p.name)) + +people = (people - to_remove).union(to_add) + +print(Person("Awesome Brandon") in people) # True +```