{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# DS and Algo in Python"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## List of Algorithms To Review by Bing Chat\n",
    "\n",
    "* **Binary Search Tree (BST)**: \n",
    "   * Understand the properties of BSTs, including the left-child-right-parent-right-child ordering.\n",
    "   * Know how to perform insertions, deletions, and searches efficiently.\n",
    "   * Be aware that there are variations of BSTs (e.g., AVL trees, Red-Black trees).  \n",
    "* **Tree Traversals**:\n",
    "   * In-order traversal: Visit left subtree, root, right subtree.\n",
    "   * Pre-order traversal: Visit root, left subtree, right subtree.\n",
    "   * Post-order traversal: Visit left subtree, right subtree, root.\n",
    "* **Graph Algorithms**:\n",
    "   * Breadth-First Search (BFS): Explore nodes level by level.\n",
    "   * Depth-First Search (DFS): Explore as deeply as possible before backtracking.\n",
    "   * Dijkstra's algorithm: Find the shortest path in weighted graphs.\n",
    "   * Bellman-Ford algorithm: Handle negative-weight edges in graphs.\n",
    "* **Sorting Algorithms**:\n",
    "   * QuickSort, MergeSort, and HeapSort.\n",
    "   * Understand their time complexities and when to use each.\n",
    "* **Dynamic Programming**:\n",
    "   * Practice solving problems using dynamic programming techniques.\n",
    "   * Understand memoization and tabulation."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## FFT\n",
    "\n",
    "Simple python implementation of the [Cooley–Tukey algorithm](https://en.wikipedia.org/wiki/Cooley%E2%80%93Tukey_FFT_algorithm#The_radix-2_DIT_case) for arrays whose size is a power of 2. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[ 6.-0.j -2.+2.j -2.-0.j -2.-2.j]\n",
      "[ 6.+0.j -2.+2.j -2.+0.j -2.-2.j]\n"
     ]
    }
   ],
   "source": [
    "import numpy as np\n",
    "import scipy.fft\n",
    "\n",
    "def fft(x):\n",
    "    '''\n",
    "    assuming x is a list whose length is a power of 2\n",
    "    '''\n",
    "    n = len(x)\n",
    "    if n==1:\n",
    "        return np.array(x)\n",
    "    \n",
    "    e = fft(x[::2])\n",
    "    o = fft(x[1::2])\n",
    "    w = np.exp(-2*np.pi*np.arange(n/2)*1.j/n)\n",
    "    \n",
    "    return np.concatenate([e+w*o, e-w*o])\n",
    "            \n",
    "x = np.arange(4)\n",
    "print(scipy.fft.fft(x))\n",
    "print(fft(x))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## [Master Theorem](https://en.wikipedia.org/wiki/Master_theorem_(analysis_of_algorithms))\n",
    "\n",
    "If\n",
    "\\begin{align*}\n",
    "T(n) &= a T\\left(\\frac{n}{b}\\right) + f(n), \\\\\n",
    "c_{\\text{crit}} &= \\log_b a, \n",
    "\\end{align*}\n",
    "then\n",
    "\\begin{align*}\n",
    "T(n) &= \\begin{cases}\n",
    "\\Theta(n^{c_{\\text{crit}}}) &\\mbox{ if }f(n) = O(n^c) \\mbox{ for } c < c_{\\text{crit}}\\\\\n",
    "\\Theta(n^{c_{\\text{crit}}}\\log^{k+1} n) &\\mbox{ if }f(n) = \\Theta(n^{c_{\\text{crit}}}\\log^k n) \\mbox{ for } k > -1\\\\\n",
    "\\Theta(f(n)) &\\mbox{ if }f(n) \\mbox{ satisfies some regularity condition}\n",
    "\\end{cases}.\n",
    "\\end{align*}\n",
    "\n",
    "* Case (a): $f$ is small and the $T(n) = a T\\left(\\frac{n}{b}\\right)$ part of the equation dominates\n",
    "* Case (b): $f$ and $T(n) = a T\\left(\\frac{n}{b}\\right)$ are equally important\n",
    "* Case (c): $f$ dominates\n",
    "\n",
    "An example of case (b) is merge sort. \n",
    "\n",
    "Case (b) has extensions to all values of $k$. The complete version is\n",
    "\n",
    "\\begin{align*}\n",
    "T(n) &= \\begin{cases}\n",
    "\\Theta(n^{c_{\\text{crit}}}) &\\mbox{ if }f(n) = O(n^c) \\mbox{ for } c < c_{\\text{crit}}\\\\\n",
    "\\Theta(n^{c_{\\text{crit}}}) &\\mbox{ if }f(n) = \\Theta(n^{c_{\\text{crit}}}\\log^k n) \\mbox{ for } k < -1\\\\\n",
    "\\Theta(n^{c_{\\text{crit}}}\\log\\log n) &\\mbox{ if }f(n) = \\Theta(n^{c_{\\text{crit}}}\\log^k n) \\mbox{ for } k = -1\\\\\n",
    "\\Theta(n^{c_{\\text{crit}}}\\log^{k+1} n) &\\mbox{ if }f(n) = \\Theta(n^{c_{\\text{crit}}}\\log^k n) \\mbox{ for } k > -1\\\\\n",
    "\\Theta(f(n)) &\\mbox{ if }f(n) \\mbox{ satisfies some regularity condition}\n",
    "\\end{cases}.\n",
    "\\end{align*}\n",
    "\n",
    "Why $c_{\\text{crit}} = \\log_b a$? Solve $T(n) = a T\\left(\\frac{n}{b}\\right)$ with $T(n) = n^c$ for $c$:\n",
    "\n",
    "\\begin{align*}\n",
    "&n^c = a\\left(\\frac{n}{b}\\right)^c\\\\\n",
    "\\iff &c\\log n = \\log a + c(\\log n - \\log b)\\\\\n",
    "\\iff & c = \\log_b a.\n",
    "\\end{align*}\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Heap"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[9, 8, 8, 6, 8, 4, 3, 3, 5, 3, 3, 2, 2]\n",
      "[9, 8, 8, 6, 8, 4, 3, 3, 5, 3, 3, 2]\n",
      "[9, 8, 4, 6, 8, 2, 3, 3, 5, 3, 3]\n"
     ]
    }
   ],
   "source": [
    "def push(heap, num):\n",
    "    '''\n",
    "    Push one number to a max heap and return the final location (index) of that number. \n",
    "    Since list is mutable the heap will be updated after each function call. \n",
    "    '''\n",
    "    idx = len(heap)  # index of num\n",
    "    heap.append(num)\n",
    "    isEven = lambda x: x%2==0\n",
    "    parentIdx = max(idx//2 -1 if isEven(idx) else (idx+1)//2 -1, 0)\n",
    "    \n",
    "    # heapify up\n",
    "    while heap[parentIdx] < heap[idx]:\n",
    "        heap[parentIdx], heap[idx] = heap[idx], heap[parentIdx]\n",
    "        idx = parentIdx\n",
    "        parentIdx = max(idx//2 -1 if isEven(idx) else (idx+1)//2 -1, 0)\n",
    "        \n",
    "    return idx\n",
    "\n",
    "def remove(heap, idx):\n",
    "    '''\n",
    "    Remove an element from a max heap by index\n",
    "    Can not input -1 as index or otherwise children index computation will go wrong. \n",
    "    '''\n",
    "    heap[idx], heap[-1] = heap[-1], heap[idx]\n",
    "    del heap[-1]\n",
    "    childIdxL = 2*(idx+1)-1\n",
    "    childIdxR = 2*(idx+1)\n",
    "    \n",
    "    # heapify down\n",
    "    while (childIdxL < len(heap) and childIdxR < len(heap)) and (heap[idx] < heap[childIdxL] or heap[idx] < heap[childIdxR]): \n",
    "        # still has two children\n",
    "        if heap[childIdxL] < heap[childIdxR]:\n",
    "            heap[idx], heap[childIdxR] = heap[childIdxR], heap[idx]\n",
    "            idx = childIdxR\n",
    "        else:\n",
    "            heap[idx], heap[childIdxL] = heap[childIdxL], heap[idx]\n",
    "            idx = childIdxL\n",
    "        \n",
    "        childIdxL = 2*(idx+1)-1\n",
    "        childIdxR = 2*(idx+1)\n",
    "\n",
    "    if (childIdxL < len(heap)) and (heap[idx] < heap[childIdxL]):   # only has left child\n",
    "        heap[idx], heap[childIdxL] = heap[childIdxL], heap[idx]\n",
    "    \n",
    "heap = []\n",
    "for n in [8, 6, 4, 3, 3, 2, 3, 5, 8, 3, 8, 2, 9]:\n",
    "    push(heap, n)\n",
    "\n",
    "print(heap)\n",
    "\n",
    "remove(heap, len(heap)-1)\n",
    "print(heap)\n",
    "\n",
    "remove(heap, 2)\n",
    "print(heap)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "* Heapify only takes $O(n)$ time [if the heap is built bottom-up](https://www.youtube.com/watch?v=MiyLo8adrWw)\n",
    "* 實際上 top-down heapify 只是 worst case 是 $O(n\\log n)$，on average 還是 $O(n)$，因為大部份新加進來的 node 不會真的一路沉到最下面\n",
    "\n",
    "|index |1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 |\n",
    "|--|--|--|--|--|--|--|--|--|--|--|--|--|--|--|--|--|\n",
    "|# operations (top-down) | 0 | 1 | 1 | 2 | 2 | 2 | 2 | 3 | 3 | 3 | 3 | 3 | 3 | 3 | 3 | 4 |\n",
    "|# operations (bottom-up) | 4 | 3 | 2 | 2 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "\u001b[0;31mDocstring:\u001b[0m Transform list into a heap, in-place, in O(len(heap)) time.\n",
       "\u001b[0;31mType:\u001b[0m      builtin_function_or_method\n"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "import heapq\n",
    "\n",
    "# lst = [0, 5, 2, 6, 3, 1, 5, 7, 8]\n",
    "\n",
    "heapq.heapify?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Huffman Coding\n",
    "\n",
    "* 可以把建 Huffman Tree 移到 ctor 裡然後寫 static factory methods [回收物件再利用](https://stackoverflow.com/questions/929021/what-are-static-factory-methods/929273#929273)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'In computer science and information theory, a Huffman code is a particular type of optimal prefix code that is commonly used for lossless data compression. The process of finding or using such a code proceeds by means of Huffman coding, an algorithm developed by David A. Huffman while he was a Sc.D. student at MIT, and published in the 1952 paper \"A Method for the Construction of Minimum-Redundancy Codes\".'"
      ]
     },
     "execution_count": 1,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "'''\n",
    "直接用 HeapNode 當 Huffman Tree 的 Node。HeapNode 裡的 left 和 right 和 heap 無關\n",
    "'''\n",
    "\n",
    "from pandas import DataFrame\n",
    "import heapq\n",
    "\n",
    "class HeapNode:\n",
    "    def __init__(self, freq, char=None, left=None, right=None):\n",
    "        self.char = char\n",
    "        self.freq = freq\n",
    "        \n",
    "        # Children of the Huffman Tree, not the heap\n",
    "        self.left = left\n",
    "        self.right = right\n",
    "        \n",
    "    def __lt__(self, other):\n",
    "        return self.freq < other.freq\n",
    "    \n",
    "    def __repr__(self):\n",
    "        return f'({self.char}, {self.freq})'\n",
    "\n",
    "    \n",
    "class HuffmanCoding:\n",
    "    \n",
    "    def __init__(self, message):\n",
    "        self.message = message\n",
    "        self.huffmanTree = None\n",
    "        \n",
    "    def _buildHuffmanTree(self):\n",
    "        freqTable = DataFrame(list(self.message), columns=['char']).groupby('char').size().to_frame(name='freq').reset_index()\n",
    "        nodes = [HeapNode(char=char, freq=freq) for char, freq in freqTable.values]\n",
    "        h = []\n",
    "        for node in nodes:\n",
    "            heapq.heappush(h, node)\n",
    "\n",
    "        while len(h) > 1:\n",
    "            left = heapq.heappop(h)\n",
    "            right = heapq.heappop(h)\n",
    "            heapq.heappush(h, HeapNode(freq=left.freq+right.freq, left=left, right=right))\n",
    "            \n",
    "        return h[0]\n",
    "\n",
    "    def codeCharPairs(self, root):\n",
    "        '''\n",
    "        return a list of tuples (code, char)\n",
    "        '''\n",
    "        if root.char:\n",
    "            return [('', root.char)]\n",
    "        else:\n",
    "            L = [('0'+code, char) for code, char in self.codeCharPairs(root.left)]\n",
    "            R = [('1'+code, char) for code, char in self.codeCharPairs(root.right)]\n",
    "            return L+R\n",
    "\n",
    "    def compress(self, message):\n",
    "        self.huffmanTree = self.huffmanTree or self._buildHuffmanTree()\n",
    "        codeTable = {char: code for code, char in self.codeCharPairs(self.huffmanTree)}\n",
    "        return ''.join([codeTable[char] for char in message])\n",
    "\n",
    "    def decompress(self, compressed):\n",
    "        self.huffmanTree = self.huffmanTree or self._buildHuffmanTree()\n",
    "        charTable = {code: char for code, char in self.codeCharPairs(self.huffmanTree)}\n",
    "        message = ''\n",
    "        code = ''\n",
    "        for digit in compressed:\n",
    "            code += digit\n",
    "            if code in charTable:\n",
    "                message += charTable[code]\n",
    "                code = ''\n",
    "\n",
    "        return message\n",
    "    \n",
    "# original message    \n",
    "\n",
    "message = r'In computer science and information theory, a Huffman code is a particular type of optimal prefix code that is commonly used for lossless data compression. The process of finding or using such a code proceeds by means of Huffman coding, an algorithm developed by David A. Huffman while he was a Sc.D. student at MIT, and published in the 1952 paper \"A Method for the Construction of Minimum-Redundancy Codes\".'\n",
    "message"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "True"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# compression and decompression get back the original message\n",
    "\n",
    "hc = HuffmanCoding(message)\n",
    "compressed = hc.compress(message)\n",
    "message == hc.decompress(compressed)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'110110000110111110101010100011000010010000010111100011101001101000111011011011010101111101110110001011100110110110011010110001000101110000001110100110111000001010101110101100010011001011101110111111010110110010110011100110001011101101111101010100010101111100110100111011111110000011111000000000111101010010000110111110001110000100110100001011111101011001111101010000000000111000101110001111110000110001011110010011110110101111110101010001010111110000010100111000011100110100111110101010100011000110100110000111001101111001001001011001011111001101011000111000111010010001000001110110100010011100100111000001111111101010101000110000110001011010001000011101001100001011111001111001010101111110000110001010110101011010001001111010110011111100100110110001000110110110111011110101100011110010010000110110110111011101001001011010010101110111111110101010001010111111000011000101011010101110110010010011101011111001101111000110110111011001001111010110011110101101100101100111001100010111011011111010101000100011011011011100101110111011101101110111000111101110101011000001100000101010001111001010111001110110110001110101000010110010111010111110011011111011110011110011101001100101110001000000101111010110110010110011100110001011101101111101100101010001100011101111101010101111111011001011101001110111111110111111110100001011101111000010111101000000100100010101101100000111011100001110101100110110001001111001011101110111011000101111000010010010111100011001101000101010110010111001101101110000010101011111100111001110110100100111111100111110111100000111100001011110001111101101100010001110101100101100000101010100010111110011010110001110000010101011111000100110100110010000001100010010110100000001110100110111101011001111010110000110110001110001100101000110011100011011111010110010100100110001001110110110101001101110001001101000101011010011011011000101'"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# compressed message\n",
    "\n",
    "compressed"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0.43704156479217604"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# compression ratio\n",
    "\n",
    "1 - len(compressed)/(len(message)*8)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[('0000', 't'),\n",
       " ('0001000', 'A'),\n",
       " ('0001001', 'C'),\n",
       " ('000101', '.'),\n",
       " ('00011', 'l'),\n",
       " ('0010', 'd'),\n",
       " ('0011', 'i'),\n",
       " ('0100', 's'),\n",
       " ('01010', 'h'),\n",
       " ('0101100', 'M'),\n",
       " ('0101101', 'H'),\n",
       " ('0101110', ','),\n",
       " ('0101111', 'b'),\n",
       " ('0110', 'n'),\n",
       " ('0111', 'a'),\n",
       " ('10000', 'p'),\n",
       " ('10001', 'm'),\n",
       " ('10010', 'u'),\n",
       " ('100110', 'y'),\n",
       " ('100111000', '-'),\n",
       " ('100111001', '1'),\n",
       " ('10011101', 'v'),\n",
       " ('10011110', 'T'),\n",
       " ('100111110', '2'),\n",
       " ('100111111', '5'),\n",
       " ('1010', 'o'),\n",
       " ('1011', 'e'),\n",
       " ('11000', 'r'),\n",
       " ('11001', 'f'),\n",
       " ('11010', 'c'),\n",
       " ('11011000', 'I'),\n",
       " ('11011001', 'w'),\n",
       " ('110110100', '9'),\n",
       " ('110110101', 'x'),\n",
       " ('11011011', '\"'),\n",
       " ('1101110', 'g'),\n",
       " ('11011110', 'D'),\n",
       " ('110111110', 'R'),\n",
       " ('110111111', 'S'),\n",
       " ('111', ' ')]"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Huffman Tree 的性質：不會有任何一個 code 是另一個 code 的 prefix，所以 compressed 的結果中不需要放 separator 也可以 decompress\n",
    "\n",
    "hc.codeCharPairs(hc.huffmanTree)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.12"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}