In Python, there are four built-in data types that we can use to store collections of data. With different qualities and characteristics, these built-in data types are List (list), Tuple (tuple), Set (set), and Dictionary (dict).
In this article, we are going to dig into the rabbit holes of List, Tuple, and Set in Python. We will go through their differences and when to use these data types.
As Dictionary associates keys with their respective values, which is a very different use case compared to List, Tuple, and Set (which simply just contain values), it won’t be part of this discussion.
For the sake of simplicity, I will use Set and Dictionary interchangeably, as they are based on Hash Table (or Hash Map).
TL;DR
- If you need to store duplicates, go for List or Tuple.
- For List vs. Tuple, if you do not intend to mutate, go for Tuple.
- If you do not need to store duplicates, always go for Set or Dictionary, as they are significantly faster when it comes to determining if an object is present in the Set (e.g. x in set_or_dict).
Why do we care?
For the most part, these data types can be used interchangeably within an application without much trouble.
Yet, imagine if we were given a task to check if a needle exists in a sizable haystack. What would be the most efficient way in terms of speed and memory to do so?
Should the haystack be a List? What about a Tuple? Or why not always use a Set (or a Dictionary)? What are the caveats that we should look out for?
Let’s dig in!
Differences between List, Tuple, and Set
Duplicates
If I were to explain this, List and Tuple are like siblings in Python. Set (or Dictionary), on the other hand, is like a cousin to both of them.
Unlike List or Tuple, a Set cannot contain duplicates. In other words, the elements in a Set are unique.
set_example = {1, 1, 2, 3, 3, 3}
# {1, 2, 3}
fruit_set = {'', '', '', '', '', ''}
# {'', '', ''}
Enter fullscreen mode Exit fullscreen mode
With this knowledge in mind, we now know that Set can be used to remove duplicates from a list too!
Order
You might have heard the statement “Set and Dictionary are not ordered in Python.” Well, that is only half the truth today, depending on which version of Python you are using.
Before Python 3.6, Dictionaries and Sets do not keep their insertion order. Here’s an example if you try it out in Python 3.5:
# Example in Python 3.5
fruit_size = {}
>>> fruit_size[''] = 12
>>> fruit_size[''] = 16
>>> fruit_size[''] = 20
>>> fruit_size
{'': 12, '': 20, '': 16}
Enter fullscreen mode Exit fullscreen mode
You can easily switch to different versions of Python using pyenv. Try it out!
Today, that statement is out of date by a couple of years. Starting from Python 3.7, Dictionary and Set are officially ordered by the time of insertion.
Anyway, in case you wondered, List and Tuple are ordered sequences of objects.
Mutability
When you describe an object as mutable, it’s simply a fancy way of saying the internal state of the object can be changed.
The key difference here is that Tuple is immutable (not changeable), whereas List and Set are mutable.
Despite the fact that Sets are mutable, we cannot access or change any element of a Set via indexing or slicing. Hence, we can only add new elements into a set — not change them.
Do note that the update method in a Set simply means the ability to add multiple elements at once.
Indexing
Both Tuple and List support indexing and slicing, while Set does not.
fruit_list = ['', '', '']
fruit_list[1]
# ''
animal_tuple = ('', '', '')
animal_tuple[2]
# ''
vehicle_set = {'', '', ''}
vehicle_set[0]
# TypeError: 'set' object is not subscriptable
Enter fullscreen mode Exit fullscreen mode
When to use List vs. Tuple?
As we mentioned earlier, Tuples are immutable, whereas Lists are mutable. By the same token, Tuples are fixed size in nature, whereas Lists are dynamic.
a_tuple = tuple(range(1000))
a_list = list(range(1000))
a_tuple.__sizeof__() # 8024 bytes a_list.__sizeof__() # 9088 bytes
Enter fullscreen mode Exit fullscreen mode
Use List
-
When you need to mutate your collection.
-
When you need to remove or add new items to your collection of items.
Use Tuple
-
If your data should or does not need to be changed.
-
Tuples are faster than lists. We should use Tuple instead of a List if we are defining a constant set of values and all we are ever going to do with it is iterate through it.
-
If we need an array of elements to be used as dictionary keys, we can use Tuples. As Lists are mutable, they can never be used as dictionary keys.
When to use set vs. List/Tuple?
As Set uses Hash Table as its underlying data structure, Set is blazing fast when it comes to checking if an element is inside it (e.g. x in a_set).
The idea behind it is that looking up an item in a hash table is an O(1) (constant time) operation.
So, should I always use Set or Dictionary?
Essentially, if you do not need to store duplicates, Set is going to be better than List. Period.
Summary
What are the main takeaways?
- If you need to store duplicates, go for List or Tuple.
- For List vs. Tuple, if you do not intend to mutate, go for Tuple.
- If you do not need to store duplicates, always go for Set or Dictionary. Hash maps are significantly faster when it comes to determining if an object is present in the Set (e.g. x in set_or_dict).
If you’re a numbers geek like me, check out this speed comparison between Tuple, List, and Set when you’re iterating or checking if an object is present in a collection.
Ultimately, for the most part, I think we should not overthink which data structure to use.
“Premature optimization is the root of all evil.”
References
https://wiki.python.org/moin/TimeComplexity
This article was originally published at jerrynsh.com
暂无评论内容