Introduction
In this post I am gonna show how you can implement the Trie data structure using different design principles and patterns. This idea came to me after implementing the Trie data structure and trying different algorithms.
The final result shows as a very good use case of the application of different design patterns and principles, and the benefits of using them.
The code
Full code is available on github: Trie data structure
The data structure
The Trie data structure is also called Prefix tree, as the main goal of this data structure is to provide an easy way to search for strings based on the characters of the strings, going one by one, basically, searching by the prefix.
This data structure has many application such as autocomplete and spell checker. I find the a very good explanation of the possible usages on the solution of leetcode: Trie leetcode solution.
I am not going to focus on the implementation of the algorithm itself. You can find the implementation basically anywhere. In case you want to do it yourself I recommend Tushar Roy on youtube: Tushar Roy – Trie. I based my solution on his explanation (not on his solution). So you can even see his implementation if you want to compare.
Classes and interfaces
The code is organized in 3 different packages:
- trie
- node
- algorithm
The trie package
The ITrie interface
The trie data structure is represented using an interface, the ITrie interface.
This interface defines the basic behavior of the trie data structure:
- void setTrieAlgorithm(ITrieAlgorithm trieAlgorithm);
- ITrieAlgorithm getTrieAlgorithm();
- ITrieNode getRoot();
- void insertWord(String word);
- boolean deleteWord(String word);
- boolean containsWord(String word);
- boolean containsPrefix(String prefix);
These abstract methods should be implemented by any class that wants to act as a Trie.
The first 3 methods provide means for composition and decoupling the Trie from the trie algorithms.
Notice the “setTrieAlgorithm” method, which allows to change at runtime the algorithm of the Trie.
The ITrie implementations
There are 2 different implementations for the ITrie interface:
- TrieArray : Uses an Array-based node
- TrieMap : Uses a Map-based node
The name of the implementations derive from the fact that they are using a specific type of TrieNode. For instance, the TrieArray implementaion uses a TrieNodeArray as they type of node for the trie data structure.
These nodes use a specific data structure, such as Array and Map, to store the different characters in each node.
The ITrie interface defines a method to change the trie algorithm at runtime. This is accomplished using composition of classes. The trie implementation is composed with a trie algorithm that is the one in charge of actually doing the different operations.
With this, the implementations of the methods that provide the behavior for the interface are quite simple:
@Override
public void insertWord(String word) {
trieAlgorithm.insertWord(this, word);
}
@Override
public boolean deleteWord(String word) {
return trieAlgorithm.deleteWord(this, word);
}
@Override
public boolean containsWord(String word) {
return trieAlgorithm.containsWord(this, word);
}
@Override
public boolean containsPrefix(String prefix) {
return trieAlgorithm.containsPrefix(this, prefix);
}
Enter fullscreen mode Exit fullscreen mode
The implementations use delegation, to pass the responsability of the action to the composed trie algorithm. This makes sense, as the actual operation depends on the algorithm.
The trie algorithm is injected in the constructor:
public TrieArray(ITrieAlgorithm trieAlgorithm) {
setTrieAlgorithm(trieAlgorithm);
this.root = new TrieNodeArray();
}
@Override
public void setTrieAlgorithm(ITrieAlgorithm trieAlgorithm) {
this.trieAlgorithm = trieAlgorithm;
}
Enter fullscreen mode Exit fullscreen mode
This is the constructor of the TrieArray, and the one in the TrieMap is similar.
The constructor receives an algorithm with the type of the interface ITrieAlgorithm, and assigns to the field “trieAlgorithm” using the corresponding method.
Using an interface as the type of the algorithm and assignin it in the constructor allows to use dependency injection, which is a mechanism to implement the Dependency Inversion Principle. Notice that we are also applying the design principle : “program to interfaces, not to implementations”.
This provides means for decoupling the Trie data structure from the inner details of the algorithm. The implementations of the ITrie interface, do not really know or care about the implementations of the ITrieAlgorithm interface. All the Trie cares about is the behavior of those implementations.
You should try to encapsulate and group things according to the axis of change. You group together things that change for the same reasons and at the same rate. In this case, the Trie data structure is a very stable component, and its interface along with the implementations will not likely change that often. On the other side, we have the different algorithm implementations, which can change quite often. For instance, you decide to change some inner details of the algorithm, such as the recursive calls, or the use of some inner data structure within the algorithm. These changes should not affect the Trie data structure at all.
The strategy pattern
This definition is taken from the Head First Design Patterns book (excellent book!):
“The Strategy Pattern defines a family of algorithms,
encapsulates each one, and makes them interchangeable.
Strategy lets the algorithm vary independently from
clients that use it.”
In this case, we define a family of algorithms with the ITrieAlgorithm interface.
The different implementations: Iterative, Recursive, Recursive2, can be used interchangeably. This is great! We encapsulated the parts that changed into an specific interface. Now the client or whoever uses the Trie data structure, doesn’t depend on any specific implementation. If you come up with a new better way to implement the algorithm you just have to implement the interface and that’s it. You can replace, with the setter method, and your new algorithm will be used by the Trie. No need to change absolutely anything on the Trie class!
The ITrieAlgorithm interface
This interface defines the common behavior that any trie algorithm should have:
public interface ITrieAlgorithm {
void insertWord(ITrie trie, String word);
boolean deleteWord(ITrie trie, String word);
boolean containsWord(ITrie trie, String word);
boolean containsPrefix(ITrie trie, String prefix);
}
Enter fullscreen mode Exit fullscreen mode
The important thing to notice in the method’s signature is the ITrie argument. The algorithms don’t need to know or care about the actual details of the implementation of the Trie data structure, all they care is that they implement the ITrie interface, and more specifically that they provide implementation for the getRoot method, as we will see in the next section.
The ITrieAlgorithm implementations
The are 3 different implementations of the ITrie interface:
- TrieIterativeAlgorithm
- TrieRecursiveAlgorithm
- TrieRecursiveAlgorithm2
The names are self-descriptive. We have 1 algorithm that uses an iterative approach using loops. And the other ones use a recursive approach.
Let’s take a look into the implementaion for the “insertWord” method:
/** * Insert a word in the trie * * @param trie The Trie where the word will be inserted * @param word The word to insert */
@Override
public void insertWord(ITrie trie, String word) {
insertWord(trie.getRoot(), word);
}
/** * Helper method that inserts the word in the Trie. * This algorithm sets the "isEndOfWord" flag to the trieNode that the last character points. * * @param trieNode The trieNode to insert the character * @param word The word to insert */
private void insertWord(ITrieNode trieNode, String word) {
char currentChar;
for (int i = 0; i < word.length(); i++) {
currentChar = word.charAt(i);
if (!trieNode.containsCharacter(currentChar)) {
trieNode.addCharacter(currentChar);
}
trieNode = trieNode.getTrieNodeForChar(currentChar);
}
trieNode.setEndOfWord(true);
}
Enter fullscreen mode Exit fullscreen mode
The insertWord method receives the trie as an argument with the ITrie type. The algorithm doesn’t care about the inner implementaion, it just gets the root node through the getRoot method, and proceeds with the operation.
Now take a look into the helper method. It receives the node with the interface type of ITrieNode. Again, the algorithm doesn’t really care how the nodes are actually implemented. Whether it uses an array or a map, the algorithm is the same. This a very important detail, because the algorithm is completely decoupled from the implementations of the Trie and Nodes. It can operate on any implementation. These means that we can reuse the same algorithm for different instances of Trie implementation.
Summary of Design patterns and principles applied
-
Encapsulate what varies: The actual implementation of the algorithm.
-
Favor composition over inheritance: The Trie data structure uses a “composed” algorithm, which is passed in the constructor.
-
Program to interfaces, not to implementations: The ITrie, ITrieNode and ITrieAlgorithm are interfaces that are used in the different implementations. No reference or dependency on any specific implementations. This means that the different components are decoupled and be modified independently.
-
Open/Closed principle: Whenever there is a change in your algorithm, the Trie data structure doesn’t really care, as it doesn’t depend on the implementation, you can modify or create a whole new algorithm, and you won’t have to change a thing in the Trie data structure class! The same goes for the other way around. You can change your data structure in the Trie, and the algorithm will still work on your new Trie. Obviously, all of this is possible through the interfaces defined for each class. As long as implementations adhere to those contracts, everything should be fine.
-
Depend upon abstractions. Do not depend upon concrete classes: We applied the dependency inversion principle as we inverted the source code dependency from the Trie to the concrete algorithm implementations. Now both the Trie and the Algorithms depend on abstractions. The Trie doesn’t know about any concrete implementation of the Algorithm, it just uses the interface. The same for the Algorithms. Any implementation of the algorithm does not depend on any concrete implementation of the Trie/TrieNode, it only depends on abstractions, the ITrie and the ITrieNode.
Final words
This was as simple example of how you can apply different design principles and patters to common tasks, such as creating your own algorithm implementations based on defined data structures. Applying the different design principles provides lots of benefits as the ones presented in this article.
暂无评论内容