Welcome to the next pikoTutorial !
Below you can find the code that we will be reviewing today. Before you scroll down to section where I fix it, feel free to check how many mistakes you can find.
<span>def</span> <span>CountOccurrences</span><span>(</span><span>input</span><span>):</span><span>result</span> <span>=</span> <span>{}</span><span>for</span> <span>i</span> <span>in</span> <span>range</span><span>(</span><span>len</span><span>(</span><span>input</span><span>)):</span><span>item</span> <span>=</span> <span>input</span><span>[</span><span>i</span><span>]</span><span>if</span> <span>item</span> <span>in</span> <span>result</span><span>.</span><span>keys</span><span>():</span><span>result</span><span>[</span><span>item</span><span>]</span> <span>=</span> <span>result</span><span>.</span><span>get</span><span>(</span><span>item</span><span>,</span> <span>0</span><span>)</span> <span>+</span> <span>1</span><span>else</span><span>:</span><span>result</span><span>[</span><span>item</span><span>]</span> <span>=</span> <span>1</span><span>keys</span> <span>=</span> <span>[]</span><span>for</span> <span>key</span> <span>in</span> <span>result</span><span>.</span><span>keys</span><span>():</span><span>keys</span><span>.</span><span>append</span><span>(</span><span>key</span><span>)</span><span>keys</span><span>.</span><span>sort</span><span>()</span><span>output</span> <span>=</span> <span>''</span><span>for</span> <span>key</span> <span>in</span> <span>keys</span><span>:</span> <span># Inefficient string concatenation in a loop </span> <span>output</span> <span>+=</span> <span>str</span><span>(</span><span>key</span><span>)</span> <span>+</span> <span>'</span><span>: </span><span>'</span> <span>+</span> <span>str</span><span>(</span><span>result</span><span>[</span><span>key</span><span>])</span> <span>+</span> <span>'</span><span>, </span><span>'</span><span># Step 7: Return the output string (remove last comma, lazy way) </span> <span>return</span> <span>output</span><span>[:</span><span>-</span><span>2</span><span>]</span> <span># Slicing to remove last comma </span><span>def</span> <span>CountOccurrences</span><span>(</span><span>input</span><span>):</span> <span>result</span> <span>=</span> <span>{}</span> <span>for</span> <span>i</span> <span>in</span> <span>range</span><span>(</span><span>len</span><span>(</span><span>input</span><span>)):</span> <span>item</span> <span>=</span> <span>input</span><span>[</span><span>i</span><span>]</span> <span>if</span> <span>item</span> <span>in</span> <span>result</span><span>.</span><span>keys</span><span>():</span> <span>result</span><span>[</span><span>item</span><span>]</span> <span>=</span> <span>result</span><span>.</span><span>get</span><span>(</span><span>item</span><span>,</span> <span>0</span><span>)</span> <span>+</span> <span>1</span> <span>else</span><span>:</span> <span>result</span><span>[</span><span>item</span><span>]</span> <span>=</span> <span>1</span> <span>keys</span> <span>=</span> <span>[]</span> <span>for</span> <span>key</span> <span>in</span> <span>result</span><span>.</span><span>keys</span><span>():</span> <span>keys</span><span>.</span><span>append</span><span>(</span><span>key</span><span>)</span> <span>keys</span><span>.</span><span>sort</span><span>()</span> <span>output</span> <span>=</span> <span>''</span> <span>for</span> <span>key</span> <span>in</span> <span>keys</span><span>:</span> <span># Inefficient string concatenation in a loop </span> <span>output</span> <span>+=</span> <span>str</span><span>(</span><span>key</span><span>)</span> <span>+</span> <span>'</span><span>: </span><span>'</span> <span>+</span> <span>str</span><span>(</span><span>result</span><span>[</span><span>key</span><span>])</span> <span>+</span> <span>'</span><span>, </span><span>'</span> <span># Step 7: Return the output string (remove last comma, lazy way) </span> <span>return</span> <span>output</span><span>[:</span><span>-</span><span>2</span><span>]</span> <span># Slicing to remove last comma </span>def CountOccurrences(input): result = {} for i in range(len(input)): item = input[i] if item in result.keys(): result[item] = result.get(item, 0) + 1 else: result[item] = 1 keys = [] for key in result.keys(): keys.append(key) keys.sort() output = '' for key in keys: # Inefficient string concatenation in a loop output += str(key) + ': ' + str(result[key]) + ', ' # Step 7: Return the output string (remove last comma, lazy way) return output[:-2] # Slicing to remove last comma
Enter fullscreen mode Exit fullscreen mode
This function is meant to take a list of strings, count the occurrences of each individual string and return these counters in a form of a dictionary serialized to a string. Let’s begin with the function signature:
<span>def</span> <span>CountOccurrences</span><span>(</span><span>input</span><span>):</span><span>def</span> <span>CountOccurrences</span><span>(</span><span>input</span><span>):</span>def CountOccurrences(input):
Enter fullscreen mode Exit fullscreen mode
First line and already 2 bad practices:
- name of the function
- missing type annotations
Name of the functions in Python as suggested by PEP-8 should be snake case, not camel case, so here it should be:
<span>def</span> <span>count_occurrences</span><span>(</span><span>input</span><span>):</span><span>def</span> <span>count_occurrences</span><span>(</span><span>input</span><span>):</span>def count_occurrences(input):
Enter fullscreen mode Exit fullscreen mode
The type annotations are not required in Python (after all it’s a dynamically typed language), but providing them is a good practice because they directly inform the user what types are expected on the input and on the output:
<span>from</span> <span>typing</span> <span>import</span> <span>List</span><span>def</span> <span>count_occurrences</span><span>(</span><span>input</span><span>:</span> <span>List</span><span>[</span><span>str</span><span>])</span> <span>-></span> <span>str</span><span>:</span><span>from</span> <span>typing</span> <span>import</span> <span>List</span> <span>def</span> <span>count_occurrences</span><span>(</span><span>input</span><span>:</span> <span>List</span><span>[</span><span>str</span><span>])</span> <span>-></span> <span>str</span><span>:</span>from typing import List def count_occurrences(input: List[str]) -> str:
Enter fullscreen mode Exit fullscreen mode
If you want be even more expressive, you can create aliases for the actual types:
<span>from</span> <span>typing</span> <span>import</span> <span>List</span><span>StringContainer</span> <span>=</span> <span>List</span><span>[</span><span>str</span><span>]</span><span>SerializedDictionary</span> <span>=</span> <span>str</span><span>def</span> <span>count_occurrences</span><span>(</span><span>input</span><span>:</span> <span>StringContainer</span><span>)</span> <span>-></span> <span>SerializedDictionary</span><span>:</span><span>from</span> <span>typing</span> <span>import</span> <span>List</span> <span>StringContainer</span> <span>=</span> <span>List</span><span>[</span><span>str</span><span>]</span> <span>SerializedDictionary</span> <span>=</span> <span>str</span> <span>def</span> <span>count_occurrences</span><span>(</span><span>input</span><span>:</span> <span>StringContainer</span><span>)</span> <span>-></span> <span>SerializedDictionary</span><span>:</span>from typing import List StringContainer = List[str] SerializedDictionary = str def count_occurrences(input: StringContainer) -> SerializedDictionary:
Enter fullscreen mode Exit fullscreen mode
This may an overkill for the simple types like in this case, but it’s very useful for more complex types. You gain not only more information about what such complex type represents, but what’s most important, if you ever change the underlying types, but preserve their API, you must introduce a change only in one place (in the type alias) instead of everywhere, where the type has been used so far.
Note for beginners: remember, that type annotations don’t enforce the specified types, so you can’t prevent this way calling your function with an argument of invalid type. These are information for the developers (so that they know what types to use) and for IDEs (so that they know what lines to underline). If you want to enforce type correctness, your function must perform an explicit type validation using e.g.
isinstance
before using the received argument.
Let’s move to the the next part of the function:
<span>for</span> <span>i</span> <span>in</span> <span>range</span><span>(</span><span>len</span><span>(</span><span>input</span><span>)):</span><span>item</span> <span>=</span> <span>input</span><span>[</span><span>i</span><span>]</span><span>if</span> <span>item</span> <span>in</span> <span>result</span><span>.</span><span>keys</span><span>():</span><span>result</span><span>[</span><span>item</span><span>]</span> <span>=</span> <span>result</span><span>.</span><span>get</span><span>(</span><span>item</span><span>,</span> <span>0</span><span>)</span> <span>+</span> <span>1</span><span>else</span><span>:</span><span>result</span><span>[</span><span>item</span><span>]</span> <span>=</span> <span>1</span><span>for</span> <span>i</span> <span>in</span> <span>range</span><span>(</span><span>len</span><span>(</span><span>input</span><span>)):</span> <span>item</span> <span>=</span> <span>input</span><span>[</span><span>i</span><span>]</span> <span>if</span> <span>item</span> <span>in</span> <span>result</span><span>.</span><span>keys</span><span>():</span> <span>result</span><span>[</span><span>item</span><span>]</span> <span>=</span> <span>result</span><span>.</span><span>get</span><span>(</span><span>item</span><span>,</span> <span>0</span><span>)</span> <span>+</span> <span>1</span> <span>else</span><span>:</span> <span>result</span><span>[</span><span>item</span><span>]</span> <span>=</span> <span>1</span>for i in range(len(input)): item = input[i] if item in result.keys(): result[item] = result.get(item, 0) + 1 else: result[item] = 1
Enter fullscreen mode Exit fullscreen mode
First of all, there is no need here to iterate the list with an index, so to avoid overcomplicating loop, we should just iterate directly over its elements. This way we also avoid the redundant assignment item = input[i]
:
<span>for</span> <span>item</span> <span>in</span> <span>input</span><span>:</span><span>for</span> <span>item</span> <span>in</span> <span>input</span><span>:</span>for item in input:
Enter fullscreen mode Exit fullscreen mode
Secondly, there’s no need to check for the item
explicitly in result.keys()
– the following line does the same work in less code:
<span>if</span> <span>item</span> <span>in</span> <span>result</span><span>:</span><span>if</span> <span>item</span> <span>in</span> <span>result</span><span>:</span>if item in result:
Enter fullscreen mode Exit fullscreen mode
Finally, an if
statement in this case is redundant anyway because using get(item, 0)
already does the checking work for us, so the entire if
statement and incrementation may be replaced with just one line:
<span>result</span><span>[</span><span>item</span><span>]</span> <span>=</span> <span>result</span><span>.</span><span>get</span><span>(</span><span>item</span><span>,</span> <span>0</span><span>)</span> <span>+</span> <span>1</span><span>result</span><span>[</span><span>item</span><span>]</span> <span>=</span> <span>result</span><span>.</span><span>get</span><span>(</span><span>item</span><span>,</span> <span>0</span><span>)</span> <span>+</span> <span>1</span>result[item] = result.get(item, 0) + 1
Enter fullscreen mode Exit fullscreen mode
Next in line is part responsible for sorting the items, so that in the output we have the in an alphabetical order.
<span>keys</span> <span>=</span> <span>[]</span><span>for</span> <span>key</span> <span>in</span> <span>result</span><span>.</span><span>keys</span><span>():</span><span>keys</span><span>.</span><span>append</span><span>(</span><span>key</span><span>)</span><span>keys</span><span>.</span><span>sort</span><span>()</span><span>keys</span> <span>=</span> <span>[]</span> <span>for</span> <span>key</span> <span>in</span> <span>result</span><span>.</span><span>keys</span><span>():</span> <span>keys</span><span>.</span><span>append</span><span>(</span><span>key</span><span>)</span> <span>keys</span><span>.</span><span>sort</span><span>()</span>keys = [] for key in result.keys(): keys.append(key) keys.sort()
Enter fullscreen mode Exit fullscreen mode
All of these 4 lines can be replaced just by one:
<span>sorted_keys</span><span>:</span> <span>List</span><span>[</span><span>str</span><span>]</span> <span>=</span> <span>sorted</span><span>(</span><span>result</span><span>)</span><span>sorted_keys</span><span>:</span> <span>List</span><span>[</span><span>str</span><span>]</span> <span>=</span> <span>sorted</span><span>(</span><span>result</span><span>)</span>sorted_keys: List[str] = sorted(result)
Enter fullscreen mode Exit fullscreen mode
The last challenge is finding a way how to get rid of this ugly dictionary serialization:
<span>output</span> <span>=</span> <span>''</span><span>for</span> <span>key</span> <span>in</span> <span>keys</span><span>:</span><span>output</span> <span>+=</span> <span>str</span><span>(</span><span>key</span><span>)</span> <span>+</span> <span>'</span><span>: </span><span>'</span> <span>+</span> <span>str</span><span>(</span><span>result</span><span>[</span><span>key</span><span>])</span> <span>+</span> <span>'</span><span>, </span><span>'</span><span>return</span> <span>output</span><span>[:</span><span>-</span><span>2</span><span>]</span><span>output</span> <span>=</span> <span>''</span> <span>for</span> <span>key</span> <span>in</span> <span>keys</span><span>:</span> <span>output</span> <span>+=</span> <span>str</span><span>(</span><span>key</span><span>)</span> <span>+</span> <span>'</span><span>: </span><span>'</span> <span>+</span> <span>str</span><span>(</span><span>result</span><span>[</span><span>key</span><span>])</span> <span>+</span> <span>'</span><span>, </span><span>'</span> <span>return</span> <span>output</span><span>[:</span><span>-</span><span>2</span><span>]</span>output = '' for key in keys: output += str(key) + ': ' + str(result[key]) + ', ' return output[:-2]
Enter fullscreen mode Exit fullscreen mode
It’s ugly because string concatenation like above is inefficient and imposes on us some additional things to do, like removal of the trailing coma. This can be improved by using join
function which (as the name suggests) will join our sorted_keys
list, separate its elements with coma and return it in a form of a string:
<span>output</span> <span>=</span> <span>'</span><span>, </span><span>'</span><span>.</span><span>join</span><span>(</span><span>f</span><span>'</span><span>{</span><span>key</span><span>}</span><span>: </span><span>{</span><span>result</span><span>[</span><span>key</span><span>]</span><span>}</span><span>'</span> <span>for</span> <span>key</span> <span>in</span> <span>sorted_keys</span><span>)</span><span>return</span> <span>output</span><span>output</span> <span>=</span> <span>'</span><span>, </span><span>'</span><span>.</span><span>join</span><span>(</span><span>f</span><span>'</span><span>{</span><span>key</span><span>}</span><span>: </span><span>{</span><span>result</span><span>[</span><span>key</span><span>]</span><span>}</span><span>'</span> <span>for</span> <span>key</span> <span>in</span> <span>sorted_keys</span><span>)</span> <span>return</span> <span>output</span>output = ', '.join(f'{key}: {result[key]}' for key in sorted_keys) return output
Enter fullscreen mode Exit fullscreen mode
Eventually, our refactored function looks like this:
<span>from</span> <span>typing</span> <span>import</span> <span>List</span><span>,</span> <span>Dict</span><span>StringContainer</span> <span>=</span> <span>List</span><span>[</span><span>str</span><span>]</span><span>SerializedDictionary</span> <span>=</span> <span>str</span><span>def</span> <span>count_occurrences</span><span>(</span><span>input</span><span>:</span> <span>StringContainer</span><span>)</span> <span>-></span> <span>SerializedDictionary</span><span>:</span><span>result</span><span>:</span> <span>Dict</span><span>[</span><span>str</span><span>,</span> <span>int</span><span>]</span> <span>=</span> <span>{}</span><span>for</span> <span>item</span> <span>in</span> <span>input</span><span>:</span><span>result</span><span>[</span><span>item</span><span>]</span> <span>=</span> <span>result</span><span>.</span><span>get</span><span>(</span><span>item</span><span>,</span> <span>0</span><span>)</span> <span>+</span> <span>1</span><span>sorted_keys</span><span>:</span> <span>List</span><span>[</span><span>str</span><span>]</span> <span>=</span> <span>sorted</span><span>(</span><span>result</span><span>)</span><span>output</span> <span>=</span> <span>'</span><span>, </span><span>'</span><span>.</span><span>join</span><span>(</span><span>f</span><span>'</span><span>{</span><span>key</span><span>}</span><span>: </span><span>{</span><span>result</span><span>[</span><span>key</span><span>]</span><span>}</span><span>'</span> <span>for</span> <span>key</span> <span>in</span> <span>sorted_keys</span><span>)</span><span>return</span> <span>output</span><span>from</span> <span>typing</span> <span>import</span> <span>List</span><span>,</span> <span>Dict</span> <span>StringContainer</span> <span>=</span> <span>List</span><span>[</span><span>str</span><span>]</span> <span>SerializedDictionary</span> <span>=</span> <span>str</span> <span>def</span> <span>count_occurrences</span><span>(</span><span>input</span><span>:</span> <span>StringContainer</span><span>)</span> <span>-></span> <span>SerializedDictionary</span><span>:</span> <span>result</span><span>:</span> <span>Dict</span><span>[</span><span>str</span><span>,</span> <span>int</span><span>]</span> <span>=</span> <span>{}</span> <span>for</span> <span>item</span> <span>in</span> <span>input</span><span>:</span> <span>result</span><span>[</span><span>item</span><span>]</span> <span>=</span> <span>result</span><span>.</span><span>get</span><span>(</span><span>item</span><span>,</span> <span>0</span><span>)</span> <span>+</span> <span>1</span> <span>sorted_keys</span><span>:</span> <span>List</span><span>[</span><span>str</span><span>]</span> <span>=</span> <span>sorted</span><span>(</span><span>result</span><span>)</span> <span>output</span> <span>=</span> <span>'</span><span>, </span><span>'</span><span>.</span><span>join</span><span>(</span><span>f</span><span>'</span><span>{</span><span>key</span><span>}</span><span>: </span><span>{</span><span>result</span><span>[</span><span>key</span><span>]</span><span>}</span><span>'</span> <span>for</span> <span>key</span> <span>in</span> <span>sorted_keys</span><span>)</span> <span>return</span> <span>output</span>from typing import List, Dict StringContainer = List[str] SerializedDictionary = str def count_occurrences(input: StringContainer) -> SerializedDictionary: result: Dict[str, int] = {} for item in input: result[item] = result.get(item, 0) + 1 sorted_keys: List[str] = sorted(result) output = ', '.join(f'{key}: {result[key]}' for key in sorted_keys) return output
Enter fullscreen mode Exit fullscreen mode
Shorter, simpler and easier to maintain. If we now call it like this:
<span>print</span><span>(</span><span>count_occurrences</span><span>([</span><span>'</span><span>apple</span><span>'</span><span>,</span> <span>'</span><span>banana</span><span>'</span><span>,</span> <span>'</span><span>apple</span><span>'</span><span>,</span> <span>'</span><span>orange</span><span>'</span><span>,</span> <span>'</span><span>banana</span><span>'</span><span>,</span> <span>'</span><span>apple</span><span>'</span><span>]))</span><span>print</span><span>(</span><span>count_occurrences</span><span>([</span><span>'</span><span>apple</span><span>'</span><span>,</span> <span>'</span><span>banana</span><span>'</span><span>,</span> <span>'</span><span>apple</span><span>'</span><span>,</span> <span>'</span><span>orange</span><span>'</span><span>,</span> <span>'</span><span>banana</span><span>'</span><span>,</span> <span>'</span><span>apple</span><span>'</span><span>]))</span>print(count_occurrences(['apple', 'banana', 'apple', 'orange', 'banana', 'apple']))
Enter fullscreen mode Exit fullscreen mode
The output will be as follows:
apple: 3, banana: 2, orange: 1apple: 3, banana: 2, orange: 1apple: 3, banana: 2, orange: 1
Enter fullscreen mode Exit fullscreen mode
Note for advanced: if you have already some experience with Python, you can most probably see that almost entire function above can be replaced using
Counter
object fromcollections
. I didn’t want to use it because it would hide many important points in the “bad” function and that’s just not the idea of this article, but to make the article complete, below you can find the function’s implementation usingCounter
:
<span>from</span> <span>typing</span> <span>import</span> <span>List</span><span>from</span> <span>collections</span> <span>import</span> <span>Counter</span><span>def</span> <span>count_occurrences</span><span>(</span><span>input_list</span><span>:</span> <span>List</span><span>[</span><span>str</span><span>])</span> <span>-></span> <span>str</span><span>:</span><span>counter</span> <span>=</span> <span>Counter</span><span>(</span><span>input_list</span><span>)</span><span>return</span> <span>'</span><span>, </span><span>'</span><span>.</span><span>join</span><span>(</span><span>f</span><span>'</span><span>{</span><span>key</span><span>}</span><span>: </span><span>{</span><span>value</span><span>}</span><span>'</span> <span>for</span> <span>key</span><span>,</span> <span>value</span> <span>in</span> <span>sorted</span><span>(</span><span>counter</span><span>.</span><span>items</span><span>()))</span><span>from</span> <span>typing</span> <span>import</span> <span>List</span> <span>from</span> <span>collections</span> <span>import</span> <span>Counter</span> <span>def</span> <span>count_occurrences</span><span>(</span><span>input_list</span><span>:</span> <span>List</span><span>[</span><span>str</span><span>])</span> <span>-></span> <span>str</span><span>:</span> <span>counter</span> <span>=</span> <span>Counter</span><span>(</span><span>input_list</span><span>)</span> <span>return</span> <span>'</span><span>, </span><span>'</span><span>.</span><span>join</span><span>(</span><span>f</span><span>'</span><span>{</span><span>key</span><span>}</span><span>: </span><span>{</span><span>value</span><span>}</span><span>'</span> <span>for</span> <span>key</span><span>,</span> <span>value</span> <span>in</span> <span>sorted</span><span>(</span><span>counter</span><span>.</span><span>items</span><span>()))</span>from typing import List from collections import Counter def count_occurrences(input_list: List[str]) -> str: counter = Counter(input_list) return ', '.join(f'{key}: {value}' for key, value in sorted(counter.items()))
Enter fullscreen mode Exit fullscreen mode
暂无评论内容