DS and Algo in Python
List of Algorithms To Review by Bing Chat
Binary Search Tree (BST):
Understand the properties of BSTs, including the left-child-right-parent-right-child ordering.
Know how to perform insertions, deletions, and searches efficiently.
Be aware that there are variations of BSTs (e.g., AVL trees, Red-Black trees).
Tree Traversals:
In-order traversal: Visit left subtree, root, right subtree.
Pre-order traversal: Visit root, left subtree, right subtree.
Post-order traversal: Visit left subtree, right subtree, root.
Graph Algorithms:
Breadth-First Search (BFS): Explore nodes level by level.
Depth-First Search (DFS): Explore as deeply as possible before backtracking.
Dijkstra’s algorithm: Find the shortest path in weighted graphs.
Bellman-Ford algorithm: Handle negative-weight edges in graphs.
Sorting Algorithms:
QuickSort, MergeSort, and HeapSort.
Understand their time complexities and when to use each.
Dynamic Programming:
Practice solving problems using dynamic programming techniques.
Understand memoization and tabulation.
FFT
Simple python implementation of the Cooley–Tukey algorithm for arrays whose size is a power of 2.
[3]:
import numpy as np
import scipy.fft
def fft(x):
'''
assuming x is a list whose length is a power of 2
'''
n = len(x)
if n==1:
return np.array(x)
e = fft(x[::2])
o = fft(x[1::2])
w = np.exp(-2*np.pi*np.arange(n/2)*1.j/n)
return np.concatenate([e+w*o, e-w*o])
x = np.arange(4)
print(scipy.fft.fft(x))
print(fft(x))
[ 6.-0.j -2.+2.j -2.-0.j -2.-2.j]
[ 6.+0.j -2.+2.j -2.+0.j -2.-2.j]
Master Theorem
If \begin{align*} T(n) &= a T\left(\frac{n}{b}\right) + f(n), \\ c_{\text{crit}} &= \log_b a, \end{align*} then \begin{align*} T(n) &= \begin{cases} \Theta(n^{c_{\text{crit}}}) &\mbox{ if }f(n) = O(n^c) \mbox{ for } c < c_{\text{crit}}\\ \Theta(n^{c_{\text{crit}}}\log^{k+1} n) &\mbox{ if }f(n) = \Theta(n^{c_{\text{crit}}}\log^k n) \mbox{ for } k > -1\\ \Theta(f(n)) &\mbox{ if }f(n) \mbox{ satisfies some regularity condition} \end{cases}. \end{align*}
Case (a): \(f\) is small and the \(T(n) = a T\left(\frac{n}{b}\right)\) part of the equation dominates
Case (b): \(f\) and \(T(n) = a T\left(\frac{n}{b}\right)\) are equally important
Case (c): \(f\) dominates
An example of case (b) is merge sort.
Case (b) has extensions to all values of \(k\). The complete version is
\begin{align*} T(n) &= \begin{cases} \Theta(n^{c_{\text{crit}}}) &\mbox{ if }f(n) = O(n^c) \mbox{ for } c < c_{\text{crit}}\\ \Theta(n^{c_{\text{crit}}}) &\mbox{ if }f(n) = \Theta(n^{c_{\text{crit}}}\log^k n) \mbox{ for } k < -1\\ \Theta(n^{c_{\text{crit}}}\log\log n) &\mbox{ if }f(n) = \Theta(n^{c_{\text{crit}}}\log^k n) \mbox{ for } k = -1\\ \Theta(n^{c_{\text{crit}}}\log^{k+1} n) &\mbox{ if }f(n) = \Theta(n^{c_{\text{crit}}}\log^k n) \mbox{ for } k > -1\\ \Theta(f(n)) &\mbox{ if }f(n) \mbox{ satisfies some regularity condition} \end{cases}. \end{align*}
Why \(c_{\text{crit}} = \log_b a\)? Solve \(T(n) = a T\left(\frac{n}{b}\right)\) with \(T(n) = n^c\) for \(c\):
\begin{align*} &n^c = a\left(\frac{n}{b}\right)^c\\ \iff &c\log n = \log a + c(\log n - \log b)\\ \iff & c = \log_b a. \end{align*}
Heap
[1]:
def push(heap, num):
'''
Push one number to a max heap and return the final location (index) of that number.
Since list is mutable the heap will be updated after each function call.
'''
idx = len(heap) # index of num
heap.append(num)
isEven = lambda x: x%2==0
parentIdx = max(idx//2 -1 if isEven(idx) else (idx+1)//2 -1, 0)
# heapify up
while heap[parentIdx] < heap[idx]:
heap[parentIdx], heap[idx] = heap[idx], heap[parentIdx]
idx = parentIdx
parentIdx = max(idx//2 -1 if isEven(idx) else (idx+1)//2 -1, 0)
return idx
def remove(heap, idx):
'''
Remove an element from a max heap by index
Can not input -1 as index or otherwise children index computation will go wrong.
'''
heap[idx], heap[-1] = heap[-1], heap[idx]
del heap[-1]
childIdxL = 2*(idx+1)-1
childIdxR = 2*(idx+1)
# heapify down
while (childIdxL < len(heap) and childIdxR < len(heap)) and (heap[idx] < heap[childIdxL] or heap[idx] < heap[childIdxR]):
# still has two children
if heap[childIdxL] < heap[childIdxR]:
heap[idx], heap[childIdxR] = heap[childIdxR], heap[idx]
idx = childIdxR
else:
heap[idx], heap[childIdxL] = heap[childIdxL], heap[idx]
idx = childIdxL
childIdxL = 2*(idx+1)-1
childIdxR = 2*(idx+1)
if (childIdxL < len(heap)) and (heap[idx] < heap[childIdxL]): # only has left child
heap[idx], heap[childIdxL] = heap[childIdxL], heap[idx]
heap = []
for n in [8, 6, 4, 3, 3, 2, 3, 5, 8, 3, 8, 2, 9]:
push(heap, n)
print(heap)
remove(heap, len(heap)-1)
print(heap)
remove(heap, 2)
print(heap)
[9, 8, 8, 6, 8, 4, 3, 3, 5, 3, 3, 2, 2]
[9, 8, 8, 6, 8, 4, 3, 3, 5, 3, 3, 2]
[9, 8, 4, 6, 8, 2, 3, 3, 5, 3, 3]
Heapify only takes \(O(n)\) time if the heap is built bottom-up
實際上 top-down heapify 只是 worst case 是 \(O(n\log n)\),on average 還是 \(O(n)\),因為大部份新加進來的 node 不會真的一路沉到最下面
index |
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
9 |
10 |
11 |
12 |
13 |
14 |
15 |
16 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
# operations (top-down) |
0 |
1 |
1 |
2 |
2 |
2 |
2 |
3 |
3 |
3 |
3 |
3 |
3 |
3 |
3 |
4 |
# operations (bottom-up) |
4 |
3 |
2 |
2 |
1 |
1 |
1 |
1 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
[4]:
import heapq
# lst = [0, 5, 2, 6, 3, 1, 5, 7, 8]
heapq.heapify?
Docstring: Transform list into a heap, in-place, in O(len(heap)) time.
Type: builtin_function_or_method
Huffman Coding
可以把建 Huffman Tree 移到 ctor 裡然後寫 static factory methods 回收物件再利用
[1]:
'''
直接用 HeapNode 當 Huffman Tree 的 Node。HeapNode 裡的 left 和 right 和 heap 無關
'''
from pandas import DataFrame
import heapq
class HeapNode:
def __init__(self, freq, char=None, left=None, right=None):
self.char = char
self.freq = freq
# Children of the Huffman Tree, not the heap
self.left = left
self.right = right
def __lt__(self, other):
return self.freq < other.freq
def __repr__(self):
return f'({self.char}, {self.freq})'
class HuffmanCoding:
def __init__(self, message):
self.message = message
self.huffmanTree = None
def _buildHuffmanTree(self):
freqTable = DataFrame(list(self.message), columns=['char']).groupby('char').size().to_frame(name='freq').reset_index()
nodes = [HeapNode(char=char, freq=freq) for char, freq in freqTable.values]
h = []
for node in nodes:
heapq.heappush(h, node)
while len(h) > 1:
left = heapq.heappop(h)
right = heapq.heappop(h)
heapq.heappush(h, HeapNode(freq=left.freq+right.freq, left=left, right=right))
return h[0]
def codeCharPairs(self, root):
'''
return a list of tuples (code, char)
'''
if root.char:
return [('', root.char)]
else:
L = [('0'+code, char) for code, char in self.codeCharPairs(root.left)]
R = [('1'+code, char) for code, char in self.codeCharPairs(root.right)]
return L+R
def compress(self, message):
self.huffmanTree = self.huffmanTree or self._buildHuffmanTree()
codeTable = {char: code for code, char in self.codeCharPairs(self.huffmanTree)}
return ''.join([codeTable[char] for char in message])
def decompress(self, compressed):
self.huffmanTree = self.huffmanTree or self._buildHuffmanTree()
charTable = {code: char for code, char in self.codeCharPairs(self.huffmanTree)}
message = ''
code = ''
for digit in compressed:
code += digit
if code in charTable:
message += charTable[code]
code = ''
return message
# original message
message = r'In computer science and information theory, a Huffman code is a particular type of optimal prefix code that is commonly used for lossless data compression. The process of finding or using such a code proceeds by means of Huffman coding, an algorithm developed by David A. Huffman while he was a Sc.D. student at MIT, and published in the 1952 paper "A Method for the Construction of Minimum-Redundancy Codes".'
message
[1]:
'In computer science and information theory, a Huffman code is a particular type of optimal prefix code that is commonly used for lossless data compression. The process of finding or using such a code proceeds by means of Huffman coding, an algorithm developed by David A. Huffman while he was a Sc.D. student at MIT, and published in the 1952 paper "A Method for the Construction of Minimum-Redundancy Codes".'
[2]:
# compression and decompression get back the original message
hc = HuffmanCoding(message)
compressed = hc.compress(message)
message == hc.decompress(compressed)
[2]:
True
[3]:
# compressed message
compressed
[3]:
'110110000110111110101010100011000010010000010111100011101001101000111011011011010101111101110110001011100110110110011010110001000101110000001110100110111000001010101110101100010011001011101110111111010110110010110011100110001011101101111101010100010101111100110100111011111110000011111000000000111101010010000110111110001110000100110100001011111101011001111101010000000000111000101110001111110000110001011110010011110110101111110101010001010111110000010100111000011100110100111110101010100011000110100110000111001101111001001001011001011111001101011000111000111010010001000001110110100010011100100111000001111111101010101000110000110001011010001000011101001100001011111001111001010101111110000110001010110101011010001001111010110011111100100110110001000110110110111011110101100011110010010000110110110111011101001001011010010101110111111110101010001010111111000011000101011010101110110010010011101011111001101111000110110111011001001111010110011110101101100101100111001100010111011011111010101000100011011011011100101110111011101101110111000111101110101011000001100000101010001111001010111001110110110001110101000010110010111010111110011011111011110011110011101001100101110001000000101111010110110010110011100110001011101101111101100101010001100011101111101010101111111011001011101001110111111110111111110100001011101111000010111101000000100100010101101100000111011100001110101100110110001001111001011101110111011000101111000010010010111100011001101000101010110010111001101101110000010101011111100111001110110100100111111100111110111100000111100001011110001111101101100010001110101100101100000101010100010111110011010110001110000010101011111000100110100110010000001100010010110100000001110100110111101011001111010110000110110001110001100101000110011100011011111010110010100100110001001110110110101001101110001001101000101011010011011011000101'
[4]:
# compression ratio
1 - len(compressed)/(len(message)*8)
[4]:
0.43704156479217604
[5]:
# Huffman Tree 的性質:不會有任何一個 code 是另一個 code 的 prefix,所以 compressed 的結果中不需要放 separator 也可以 decompress
hc.codeCharPairs(hc.huffmanTree)
[5]:
[('0000', 't'),
('0001000', 'A'),
('0001001', 'C'),
('000101', '.'),
('00011', 'l'),
('0010', 'd'),
('0011', 'i'),
('0100', 's'),
('01010', 'h'),
('0101100', 'M'),
('0101101', 'H'),
('0101110', ','),
('0101111', 'b'),
('0110', 'n'),
('0111', 'a'),
('10000', 'p'),
('10001', 'm'),
('10010', 'u'),
('100110', 'y'),
('100111000', '-'),
('100111001', '1'),
('10011101', 'v'),
('10011110', 'T'),
('100111110', '2'),
('100111111', '5'),
('1010', 'o'),
('1011', 'e'),
('11000', 'r'),
('11001', 'f'),
('11010', 'c'),
('11011000', 'I'),
('11011001', 'w'),
('110110100', '9'),
('110110101', 'x'),
('11011011', '"'),
('1101110', 'g'),
('11011110', 'D'),
('110111110', 'R'),
('110111111', 'S'),
('111', ' ')]