Python Advanced
[1]:
import pandas as pd
def table(table_name):
return pd.read_csv(f'./tables/{table_name}.csv').fillna('')
Slicing Time Complexity
NumPy slicing is view, while native python list and str slicing is copy
跑
arr[j:i]時 NumPy 不會建立新的資料,而是建立一個指向原始陣列的 view這個視圖只改變了 shape 和 strides(步幅),不會複製底層資料:
O(1)若你使用 advanced indexing 如
arr[[1, 3, 5]],就會建立新的陣列:O(k)
類型 |
切片結果 |
Time |
備註 |
|---|---|---|---|
|
copy |
|
複製 |
|
view |
|
不複製資料,只改 metadata |
Multithreading and Multiprocessing
See this SO post. Threads run in the same memory space, while processes have separate memory. 一個 process 有自己的獨立的記憶體,甚至 IO
在 windows,multiprocessing 會很慢,因為每個 process 都重新 new 一個 python interpreter session,在 Unix-like systems 不用。看這個 SO post
[2]:
table('multi_threading_processing')
[2]:
| Multithreading | Multiprocessing | |
|---|---|---|
| 0 | light weight | heavy, more memory overhead |
| 1 | share memory | isolated |
| 2 | easy to communicate | hard |
| 3 | safety concern (race condition, deadlocks) | safe |
| 4 | good for I/O bound tasks | good for CPU bound tasks |
Global Interpreter Lock (GIL) and Multithreading
GIL is a mutex,同時只能有一個 thread 執行 Python bytecodes,所以 Python 沒有真正的 multithreading
Python multithreading 適合處理 I/O bound tasks 因為要花很多時間等外部資源
Race Condition in Multithreading
兩個 Thread 同時跑 ++counter,等兩個都跑完 counter 的值還是 1 (example from wikipedia)
[3]:
table('race_condition')
[3]:
| Thread 1 | Thread 2 | Integer value | ||
|---|---|---|---|---|
| 0 | 0 | |||
| 1 | read value | ← | 0 | |
| 2 | read value | ← | 0 | |
| 3 | increase value | 0 | ||
| 4 | increase value | 0 | ||
| 5 | write back | → | 1 | |
| 6 | write back | → | 1 |
如何避免:
Locks: Use mutexes
Atomic Operations: Use atomic indivisible operations
Thread-Safe Data Structures: Use data structures designed to handle concurrent access
Immutable Data Structures: Once created, they cannot be changed
Regex
re.match只抓 string 開頭的 match,re.search在整個 string 裡找 first match\number
[6]:
import re
s = "the world is a museum museum of passion projects"
print(re.findall(r"(.+) \1", s))
print( re.search(r"(.+) \1", s))
['museum']
<re.Match object; span=(15, 28), match='museum museum'>
In ASCII (遇到 Unicode,例如中文字,規則有點不一樣):
[3]:
table('regex_special_char')
[3]:
| 符號 | 意義 | 等價於 | 範例 | |
|---|---|---|---|---|
| 0 | \d | 數字(digit) | [0–9] | re.findall(r"\d", "A1B2") → ['1', '2'] |
| 1 | \s | 空白(whitespace) | [ \t\n\r\f\v] | re.findall(r"\s", "a b\tc\n") → [' ', '\t', '\n'] |
| 2 | \w | 英數底線(word char) | [a-zA-Z0-9_] | re.findall(r"\w", "_Hi123") → ['_', 'H', 'i', ... |
| 3 | ||||
| 4 | \D | 非數字 | [^0–9] | re.findall(r"\D", "A1!") → ['A', '!'] |
| 5 | \S | 非空白 | [^ \t\n\r\f\v] | re.findall(r"\S", "a b") → ['a', 'b'] |
| 6 | \W | 非英數底線 | [^a-zA-Z0-9_] | re.findall(r"\W", "!@#^") → ['!', '@', '#', '^'] |
\t\n\r\f\v分別是 Tab,換行,回車,換頁,垂直定位
Set Instance Attributes on the Fly
__getattribute__is called for all attribute access, regardless of whether the attribute exists__getattr__is called when an attribute is not found in__getattribute__Example from here:
[8]:
class Yeah(object):
def __init__(self, name):
self.name = name
# Gets called when an attribute is accessed
def __getattribute__(self, item):
print('__getattribute__ '+ item)
# Calling the super class to avoid recursion
return super(Yeah, self).__getattribute__(item)
# Gets called when the item is not found via __getattribute__
def __getattr__(self, item):
print('__getattr__ '+ item)
return super(Yeah, self).__setattr__(item, 'orphan')
[3]:
y1 = Yeah('yes')
y1.name
__getattribute__ name
[3]:
'yes'
[4]:
y1.foo
__getattribute__ foo
__getattr__ foo
[5]:
y1.foo
__getattribute__ foo
[5]:
'orphan'
[6]:
y1.goo
__getattribute__ goo
__getattr__ goo
[7]:
y1.__dict__
__getattribute__ __dict__
[7]:
{'name': 'yes', 'foo': 'orphan', 'goo': 'orphan'}
Singleton
[1]:
class Singleton:
_instance = None
def __new__(cls):
if not cls._instance:
cls._instance = super(Singleton, cls).__new__(cls)
return cls._instance
o1 = Singleton()
o2 = Singleton()
o1 is o2
[1]:
True
super(Singleton, cls)returns a temporary object of the superclass, which in this case is object as every class in Python inherits from object by default__new__(cls)is a special method in Python classes that is responsible for instance creation. It takes the class (not the instance) as the first argument followed by any additional arguments if needs
AsyncIO
The
asynckeyword makes a function (subroutine) a coroutineSubroutines block the process, coroutines don’t
An
asynccoroutine can have awaitable statements (starting with theawaitkeyword) which specify where in the coroutine is safe to pause and yield control to other coroutinesawaitcan only be put in front of a statement that is awaitabletime.sleep(3)is not awaitable. Its awaitable version isasyncio.sleep(3)
brew_coffee()is not a regular function call. It returns a coroutine object which can be gathered with other coroutinesCan either create a batch with
asyncio.gatheror a single task byasyncio.create_taskTo run the coroutines:
awaitthe created task or batch, orasyncio.runit (doesn’t work in Jupyter)
main function has an
awaitstatement now so it must become anasynccoroutine
[1]:
import asyncio
import time
async def brew_coffee():
print('Start brew_coffee()')
await asyncio.sleep(3)
print('End brew_coffee()')
return 'Coffee ready'
async def toast_bagel():
print('Start toast_bagel()')
await asyncio.sleep(2)
print('End toast_bagel()')
return 'Bagel ready'
async def main1():
start_time = time.time()
#########################################################
batch = asyncio.gather(brew_coffee(), toast_bagel())
result_coffee, result_bagel = await batch
#########################################################
end_time = time.time()
elapsed_time = end_time - start_time
print(f'Result of brew_coffee: {result_coffee}')
print(f'Result of toast_bagel: {result_bagel}')
print(f'Total execution time: {elapsed_time:.2f} seconds')
async def main2():
start_time = time.time()
#########################################################
coffee_task = asyncio.create_task(brew_coffee())
bagel_task = asyncio.create_task(toast_bagel())
result_coffee = await coffee_task
result_bagel = await bagel_task
#########################################################
end_time = time.time()
elapsed_time = end_time - start_time
print(f'Result of brew_coffee: {result_coffee}')
print(f'Result of toast_bagel: {result_bagel}')
print(f'Total execution time: {elapsed_time:.2f} seconds')
# asyncio.run(main1()) # RuntimeError: asyncio.run() cannot be called from a running event loop
main_task = asyncio.create_task(main2())
res = await main_task
Start brew_coffee()
Start toast_bagel()
End toast_bagel()
End brew_coffee()
Result of brew_coffee: Coffee ready
Result of toast_bagel: Bagel ready
Total execution time: 3.00 seconds
[2]:
# simple version
import asyncio
async def brew_coffee():
await asyncio.sleep(3)
async def main():
coffee_task = asyncio.create_task(brew_coffee())
result_coffee = await coffee_task
# same way to call main(): await a asyncio created task
# or asyncio.run(main()) which doesn't work in Jupyter
AsyncIO and Multiprocessing
Asyncio enables concurrency, but not parallelism by default
You can achieve parallelism by integrating thread pools and process pools
Asyncio shines for I/O-bound workloads, like network calls and file operations
For CPU-bound tasks, multiprocessing may provide better utilization
[3]:
# Example by ChatGPT, working when run by python but not in Jupyter
import asyncio
from concurrent.futures import ProcessPoolExecutor
def cpu_bound_task(n):
import time
time.sleep(2)
return f'Task {n} result'
async def main():
loop = asyncio.get_running_loop()
with ProcessPoolExecutor() as executor:
tasks = [loop.run_in_executor(executor, cpu_bound_task, i) for i in range(5)]
res = await asyncio.gather(*tasks)
print(res)
# if __name__ == '__main__':
# asyncio.run(main())
AsyncIO and Decorator
A decorator that can wrap both functions and coroutines – using
inspect.iscoroutinefunction
AsyncIO for Fixings Registration
[2]:
# __init__.py
import asyncio
data = None
data_ready = asyncio.Event()
async def get_data():
global data
# 模擬抓取資料的耗時操作
await asyncio.sleep(3)
data = {"key": "value"}
data_ready.set()
def init():
asyncio.create_task(get_data())
# 初始化
init()
[5]:
# client code, bar is the pricing function
import asyncio
from concurrent.futures import ThreadPoolExecutor
data = None
data_ready = asyncio.Event()
executor = ThreadPoolExecutor(max_workers=1)
def prepare_data():
# 模擬一個耗時計算
import time
time.sleep(5)
return {"key": "value"}
async def get_data():
global data
loop = asyncio.get_event_loop()
data = await loop.run_in_executor(executor, prepare_data)
data_ready.set()
def bar():
loop = asyncio.get_event_loop()
if not data_ready.is_set():
loop.run_until_complete(data_ready.wait())
print(f"Data is ready: {data}")
async def main():
await init()
print("Doing other tasks while waiting for data...")
await asyncio.sleep(1)
print("Still doing other tasks...")
bar()
async def init():
asyncio.create_task(get_data())
# 執行範例
# asyncio.run(main())
Computing Grid Summary
Python features required:
Packaging
Consistent venv in all computers
Dashboard
Entry point (CLI apps)
Config (ini) file
3 packages + workers env:
qmagrid_servermultiprocessing.managers.BaseManager.registershared data structures in the networkShared
multiprocessing.Manager().Queue()andmultiprocessing.Manager().dict():waiting_q(Queue)working_q(dict)result_q(dict)status_q(dict)machine_q(dict)
multiprocessing.Manager()data structures have lock so is safemonitor by Plotly Dash displaying status queue contents
qmagrid_clientDepends on the
Jobclass in the worker packageImplements context manager
QMAGridExecutorto send cloudpicked jobs towaiting_qand wait to collect results fromresult_q
with QMAGridExecutor() as executor: executor.map(f, args_list)
qmagrid_workercmd commands to run
start_one_worker,start_pct_workers,start_n_workersandstop_all_workersstart_n_workers(start_pct_workers) simplysubprocess.Popensstart_one_workern times and start sending status reportstart_one_workercheckswaiting_qconstantly and if there is a job, do the followingpop from
waiting_qand push toworking_qrun the job
pop from
working_qand push the result toresult_q
It does so as long as the corresponding status report remains in the
status_qParses a config file to determine (WIP)
Server IP (which grid?)
Percentage of all logical cores to contribute
The
workersenvTurn on workers in this env to make sure of consistent package versions
A watcher process watching a commands.txt on shared drive. Once the file is modified, execute the commands in it
The QMA Python Package Summary
Conveniently call requests on trades:
Swaption().NPV()Default instruments swaption is 1y10y
Can call Delta, Gamma, Vega, etc.
Live trade support
Trade(12345678).NPV()
Flexible requests
BermudanSwaption().CalibrationInfo()
Trade spec attributes
Trade(12345678).notional()or currency, etc.From trade JSON, not from the core library, but users don’t need to know
Singleton
MarketEnvcontext manager classConfig file:
Quants default
MarketEnvto previous day EOD, while traders default to today LIVEDefault books and products for trade population, extendible to other businesses
Default env: prod, dev or pat
Job scheduler:
Which computers run which functions at what times with what arguments specified in a
scheduled_jobs.csvExamples:
Copytree
Check if a file exists at certain time and send email notifications
Debug sheet generation
Trade(12345678).excel_render()
auto_spreadsheet()AsyncIO for fixings registration
List Comprehension With Multiple For Loops
The following are equivalent:
[1]:
[(x, y) for x in [1,2,3] for y in [3,1,4] if x != y]
[1]:
[(1, 3), (1, 4), (2, 3), (2, 1), (2, 4), (3, 1), (3, 4)]
[2]:
combs = []
for x in [1,2,3]:
for y in [3,1,4]:
if x != y:
combs.append((x, y))
combs
[2]:
[(1, 3), (1, 4), (2, 3), (2, 1), (2, 4), (3, 1), (3, 4)]
Virtual Environments
python -m venv ".myenv"to createsource .myenv/bin/activateto activate on linux.myenv/Scripts/activate.batto activate on windowsdeactivateto deactivate
redbull.py
r"""
This script keeps the computer awake by pressing right ctrl key every SEC seconds
Put this script in $USERPROFILE\Downloads for easy access
To setup using PowerShell:
> cd $env:USERPROFILE\Downloads
> python -m venv .venv
> .venv\Scripts\activate
> pip install --trusted-host files.pythonhosted.org --trusted-host pypi.org pyautogui
> python redbull.py
"""
import pyautogui
from time import sleep
SEC = 180
pyautogui.FAILSAFE = False
while True:
sleep(SEC)
pyautogui.press('ctrlright')
Where Is My Python?
[1]:
import sys, os
os.path.dirname(sys.executable)
[1]:
'/srv/conda/envs/notebook/bin'
pip
pip uninstall只能在 shell 裡用,notebook 沒辦法用,因為會有continue? (y/n)Configure pip to install from other server
pip config -v list查 config 都去哪裡找(so)
For variant 'global', will try loading 'C:\ProgramData\pip\pip.ini' For variant 'user', will try loading 'C:\Users\foobar\pip\pip.ini' For variant 'user', will try loading 'C:\Users\foobar\AppData\Roaming\pip\pip.ini' For variant 'site', will try loading 'C:\Python38\pip.ini'
去這些 folder 建一個
pip.ini裡面貼
[global] timeout = 60 index = https://repo.abc.com/repository/pypi-all/pypi index-url = https://repo.abc.com/repository/pypi-all/simple trusted-host = repo.abc.com
也可以直接在 shell 執行下面四行,log 會自己顯示 config file 存到哪去了
pip config set global.timeout 60 pip config set global.index https://repo.abc.com/repository/pypi-all/pypi pip config set global.index-url https://repo.abc.com/repository/pypi-all/simple pip config set global.trusted-host repo.abc.com
Code Packaging
install locally
cdto the top project directory wheresetup.pyispython setup.py installorpip install .orpip install -e .for developer install
Upload to PyPI:
cdto the top project directory wheresetup.pyis
git checkout 0.0.1
python setup.py sdist
twine check dist/*
twine upload --repository-url https://test.pypi.org/legacy/ dist/*
twine upload dist/*
每次 implement 一個新的 function,如果 test 裡需要 import,要記得先加進
__init__.py裡Remove a package from PyPI
Login > your projects > pyminimax > Manage > Settings > Delete project
Building a conda package and uploading it to Anaconda Cloud (medium)
C and C++ Extensions
CPython is the reference implementation of the Python programming language
Python Bindings: Calling C or C++ From Python (Real Python)
ctypes
CFFI
pybind11:改自 Boost.Python,較快但只支援 c++11 或更新的版本
Cython
Other Solutions
SWIG
這裡說最常用的是 SWIG 和 pybind11
Building C and C++ Extensions with distutils (Python Doc)
python setup.py build會編譯ext_modules裡指定的 c code,但指定在這裡的 c code 需要是處理 PyObject 的才能在 Python 裡不透過ctypes直接呼叫如果 target machine 有 c/c++ compiler(linux 都有)可能可以直接 source distribute C extensions
Windows 不一定有 compiler 所以至少需要 Windows Wheel
-
GitHub Actions building wheels in all common platforms
by Python Packaging Authority, also see Python Packaging User Guide
-
Setuptools extension to build and package CMake projects
CMake 自帶 SWIG 和 pybind11 support。這個包把 setuptools 和 CMake 接起來,可以直接在 setup.py configure CMake project
SWIG
要先
sudo apt install swigOfficial Doc 和 David “Mr. Swig” Beazley 寫的 PyCon 2008 slides
可以把 c/c++ 接到多種語言,只要寫同一份 interface files (*.i)
如果 Extension source files 裡有 interface file,distutils/setuptools 會自動跑 SWIG。看 Python doc 和 PyCon 2008 slides 第 22 頁
應該是這台機器上要先灌好 SWIG 才行
所以應該沒辦法直接 source distribute *.i
如果一台機器上有 c/c++ compiler,倒是可以 source distribute SWIG 產生的 wrapper
Python doc 示範了怎麼 package SWIG:在 setup 裡放
py_modules=['foo'],和ext_modules=[Extension('_foo', ['foo.i'], swig_opts=['-modern', '-I../include'])],
Python module 必需是 so 或 pyd file,而且原碼的 c 函數 input/output type 要是 PyObject。SWIG 只負責看著正常的 c 函數寫 wrapper
這裡有講怎麼接 numpy array
更多 c++ class 相關看這裡
Example
swig -c++ -python libswig.i產生libswig_wrap.cxx和libswig.pylibswig.py是 module frond endlibswig_wrap.cxx是 wrapper code,裡面有 input/output type 都是 PyObject 的 c 函數這兩個檔是 portable,和平台無關。所有有 c 編譯器的機器上都可以編譯這個 Python module,也不需要 SWIG
如果只是 c code 而沒有 c++ 可以省略
-c++flag:swig -python libswig.i,產生出來的 wrapper 會是libswig_wrap.c而不是 cxx用 g++ 編譯 c++,用 gcc 編譯 c
把
libswig_wrap.cpp和libswig.cxx一起編譯。-I是 include,後面的 path 裡放了 Python 相關的 header files,例如Python.h
// libswig.cpp
#include "libswig.hpp"
std::vector<int> my_range(int n){
std::vector<int> vec = {};
for (int i=0 ; i<n ; i++)
{
vec.push_back(i);
}
return vec;
}
double square(double x){
return x*x;
}
double cube(double x){
return x*x*x;
}
// libswig.hpp
#include<vector>
std::vector<int> my_range(int n);
double square(double x);
double cube(double x);
// libswig.i
%module libswig
%{
#include "libswig.hpp"
%}
#define __version__ "0.0.1";
std::vector<int> my_range(int n); // or simply %include "libswig.hpp"
double square(double x);
double cube(double x);
[1]:
!swig -c++ -python libswig.i
[2]:
!g++ -fPIC -c libswig.cpp libswig_wrap.cxx -I/srv/conda/envs/notebook/include/python3.7m
[3]:
!g++ -shared libswig.o libswig_wrap.o -o _libswig.so
[4]:
import libswig
libswig.__version__, libswig.square(5), libswig.cube(5), libswig.my_range(5)
[4]:
('0.0.1',
25.0,
125.0,
<Swig Object of type 'std::vector< int > *' at 0x7ff1047bcea0>)
Cython
Example from here and this tutorial
pip install Cython%%cpython開頭的 cell 會被 cython 編譯,%%cpython -a可以看哪一行有回到 pythonPython code 寫好之後 type 所有變數。type casting 用例如
<double> i函數可以宣告成 def,cdef 或 cpdef
實測不能 decorate cpdef 函數(為什麼?)
實測非 level one function(例如函數裡的函數)不能 cpdef
[1]:
%load_ext cython
[8]:
# Python version
def pyfac_loop(n):
r = 1.0
for i in range(1, n+1):
r *= i
return r
[16]:
%%cython -a
cpdef double cyfac_loop(int n):
cdef double r = 1.0
cdef int i
for i in range(1, n+1):
r *= <double>i
return r
[16]:
Generated by Cython 0.29.24
Yellow lines hint at Python interaction.
Click on a line that starts with a "+" to see the C code that Cython generated for it.
+1: cpdef double cyfac_loop(int n):
static PyObject *__pyx_pw_46_cython_magic_f1b0bfaa9dd99dd25796948e61b32169_1cyfac_loop(PyObject *__pyx_self, PyObject *__pyx_arg_n); /*proto*/
static double __pyx_f_46_cython_magic_f1b0bfaa9dd99dd25796948e61b32169_cyfac_loop(int __pyx_v_n, CYTHON_UNUSED int __pyx_skip_dispatch) {
double __pyx_v_r;
int __pyx_v_i;
double __pyx_r;
__Pyx_RefNannyDeclarations
__Pyx_RefNannySetupContext("cyfac_loop", 0);
/* … */
/* function exit code */
__pyx_L0:;
__Pyx_RefNannyFinishContext();
return __pyx_r;
}
/* Python wrapper */
static PyObject *__pyx_pw_46_cython_magic_f1b0bfaa9dd99dd25796948e61b32169_1cyfac_loop(PyObject *__pyx_self, PyObject *__pyx_arg_n); /*proto*/
static PyObject *__pyx_pw_46_cython_magic_f1b0bfaa9dd99dd25796948e61b32169_1cyfac_loop(PyObject *__pyx_self, PyObject *__pyx_arg_n) {
int __pyx_v_n;
PyObject *__pyx_r = 0;
__Pyx_RefNannyDeclarations
__Pyx_RefNannySetupContext("cyfac_loop (wrapper)", 0);
assert(__pyx_arg_n); {
__pyx_v_n = __Pyx_PyInt_As_int(__pyx_arg_n); if (unlikely((__pyx_v_n == (int)-1) && PyErr_Occurred())) __PYX_ERR(0, 1, __pyx_L3_error)
}
goto __pyx_L4_argument_unpacking_done;
__pyx_L3_error:;
__Pyx_AddTraceback("_cython_magic_f1b0bfaa9dd99dd25796948e61b32169.cyfac_loop", __pyx_clineno, __pyx_lineno, __pyx_filename);
__Pyx_RefNannyFinishContext();
return NULL;
__pyx_L4_argument_unpacking_done:;
__pyx_r = __pyx_pf_46_cython_magic_f1b0bfaa9dd99dd25796948e61b32169_cyfac_loop(__pyx_self, ((int)__pyx_v_n));
int __pyx_lineno = 0;
const char *__pyx_filename = NULL;
int __pyx_clineno = 0;
/* function exit code */
__Pyx_RefNannyFinishContext();
return __pyx_r;
}
static PyObject *__pyx_pf_46_cython_magic_f1b0bfaa9dd99dd25796948e61b32169_cyfac_loop(CYTHON_UNUSED PyObject *__pyx_self, int __pyx_v_n) {
PyObject *__pyx_r = NULL;
__Pyx_RefNannyDeclarations
__Pyx_RefNannySetupContext("cyfac_loop", 0);
__Pyx_XDECREF(__pyx_r);
__pyx_t_1 = PyFloat_FromDouble(__pyx_f_46_cython_magic_f1b0bfaa9dd99dd25796948e61b32169_cyfac_loop(__pyx_v_n, 0)); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 1, __pyx_L1_error)
__Pyx_GOTREF(__pyx_t_1);
__pyx_r = __pyx_t_1;
__pyx_t_1 = 0;
goto __pyx_L0;
/* function exit code */
__pyx_L1_error:;
__Pyx_XDECREF(__pyx_t_1);
__Pyx_AddTraceback("_cython_magic_f1b0bfaa9dd99dd25796948e61b32169.cyfac_loop", __pyx_clineno, __pyx_lineno, __pyx_filename);
__pyx_r = NULL;
__pyx_L0:;
__Pyx_XGIVEREF(__pyx_r);
__Pyx_RefNannyFinishContext();
return __pyx_r;
}
+2: cdef double r = 1.0
__pyx_v_r = 1.0;
3: cdef int i
+4: for i in range(1, n+1):
__pyx_t_1 = (__pyx_v_n + 1);
__pyx_t_2 = __pyx_t_1;
for (__pyx_t_3 = 1; __pyx_t_3 < __pyx_t_2; __pyx_t_3+=1) {
__pyx_v_i = __pyx_t_3;
+5: r *= <double>i
__pyx_v_r = (__pyx_v_r * ((double)__pyx_v_i)); }
+6: return r
__pyx_r = __pyx_v_r; goto __pyx_L0;
[6]:
%timeit pyfac_loop(20)
%timeit cyfac_loop(20)
1.37 µs ± 26.6 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
74.3 ns ± 1.12 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
Units of Measure Less Than a Second
Multiple of a second |
Unit |
Symbol |
|---|---|---|
\(10^{-9}\) |
1 nanosecond |
1 ns |
\(10^{-6}\) |
1 microsecond |
1 µs |
\(10^{-3}\) |
1 millisecond |
1 ms |
Integral Types
[24]:
%%cython
# cdef is an directive , telling objects are c objects
cdef:
int i = 0
unsigned long j = 1
signed short k = -3
bint flag = True
long long ll = 1LL
float a = 1.0
double b = -2.0
long double c= 1e5
str s = "abc"
print(i, j, k, ll, flag, a, b, c, s)
0 1 -3 1 True 1.0 -2.0 100000.0 abc
cimport
[15]:
%%cython
import datetime
cimport cpython.datetime # 用這個取代上面那行
import array
cimport cpython.array
import numpy as np # gives access to python functions
cimport numpy as np # gives you access to Numpy C API ---> 有 warning?不能用了?
from libc.math cimport exp # 用 c 函數會比 numpy 版本快很多
from libc.stdlib cimport rand
cdef extern from "limits.h":
int RAND_MAX
In file included from /srv/conda/envs/notebook/lib/python3.7/site-packages/numpy/core/include/numpy/ndarraytypes.h:1969,
from /srv/conda/envs/notebook/lib/python3.7/site-packages/numpy/core/include/numpy/ndarrayobject.h:12,
from /srv/conda/envs/notebook/lib/python3.7/site-packages/numpy/core/include/numpy/arrayobject.h:4,
from /home/jovyan/.cache/ipython/cython/_cython_magic_2014508b603b08191838a4a9c4c94518.c:648:
/srv/conda/envs/notebook/lib/python3.7/site-packages/numpy/core/include/numpy/npy_1_7_deprecated_api.h:17:2: warning: #warning "Using deprecated NumPy API, disable it with " "#define NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION" [-Wcpp]
17 | #warning "Using deprecated NumPy API, disable it with " \
| ^~~~~~~
ctypes
在 Python 自帶的 standard library 裡,不需另外安裝
用正常的 c code 就好,signature 不需要用 PyObject
編譯成 shared object(*.so)再手動在 python 端指定 input/output type
// lib.c
double square(double x){
return x*x;
}
double cube(double x){
return x*x*x;
}
[5]:
!gcc -fPIC -shared -o lib.so lib.c
[1]:
import ctypes
lib = ctypes.CDLL('./lib.so')
lib.square.argtypes = [ctypes.c_double]
lib.square.restype = ctypes.c_double
lib.cube.argtypes = [ctypes.c_double]
lib.cube.restype = ctypes.c_double
lib.square(5), lib.cube(5)
[1]:
(25.0, 125.0)
Always Use is None instead of ==None
==比的是值,is比位址(None是 singleton)所以
is None比==None快一點點
==可能被 overload(__eq__),使得==None出現不可預期的結果
Import in a with Statement
出了 with 仍然有用
[1]:
import numpy as np
class context:
def __enter__(self):
pass
def __exit__(self, exc_type=None, exc_value=None, traceback=None):
pass
with context():
from pandas import DataFrame
print(DataFrame(np.arange(10).reshape(2, 5)))
print(DataFrame(np.arange(10).reshape(2, 5)))
0 1 2 3 4
0 0 1 2 3 4
1 5 6 7 8 9
0 1 2 3 4
0 0 1 2 3 4
1 5 6 7 8 9
Bytes and Str
Unicode 是 ASCII 的 superset,把字元映到數字(或碼位,code points)
例如 ASCII 表有 128 個碼位,從 16 進位的 00 到 7F
Python 3 開始
str是 Unicode string
如果所有 code point 都統一用一樣大的空間來存,會很浪費空間。最原始的 ASCII 就只需要 7 個 bits 就存的下了
UTF-8 是把 code points 存起來的 standard
by far the most popular,世界流量排名前 1000 的網頁中有 97% 是用 UTF-8
Python 3 default for
str.encode()andbytes.decode()
所有的 string 都要 specify encoding 不然就沒辦法讀
Python 3
bytesis a binary serialization format represented by a sequence of 8-bits integers that is fit for storing data on the file system or sending it across the Internet
Making Command Line Commands Using Python
This app can git clone multiple repos with token:
gc repo_name_1 repo_name_2Need pycrypto to run, which requires gcc:
apt-get install gcc,pip install pycryptoStep by step:
create a new file named
gccopy and paste below into
gcchmod +x gcmake sure the path of
gcis in$PATH:export PATH=$HOME/binder:$PATH'ifgcis in$HOME/binder
Implementation details
#!/usr/bin/envtells the shell this script should be run by pythonno matter where python is installed,
#!/usr/bin/envwill lead the shell to the right location
Both the key and the initial vector of
AES.newneed to be 16 bytesBoth encrypt and decrypt output binary which needs to be decoded to string
When calling
gc repo1 repo2,sys.argvwill be['gc', 'repo1', 'repo2']
[ ]:
#!/usr/bin/env python
from Crypto.Cipher import AES
from getpass import getpass
import subprocess, sys
password = getpass()
# o1 = AES.new(password.ljust(16), AES.MODE_CFB, '*'*16)
# encrypted = o1.encrypt(LONG_AND_HARD_TO_REMEMBER_TOKEN)
encrypted = b'!>k\x98%6\x9e,j\x88\xd8\x13\xa85Z#\xdb\xa5Q\xb2\xfc^\x15\xd6\xe6mH=\xb9\xe4~\x88\xea\x8f\xe2M\xc1\xf6\xec\xcd'
aes = AES.new(password.ljust(16), AES.MODE_CFB, '*'*16) # key and initial vector both need to be 16 byptes
token = aes.decrypt(encrypted).decode()
for repo in sys.argv[1:]:
subprocess.run(['git', 'clone', f'https://{token}@github.com/beginnerSC/{repo}'])
Bare Asterisk (*) and Bare Forward Slash (/) in Function Arguments
def foo(a, b, *, c, d):強制呼叫函數時傳入c和d一定要寫c=和d=(named arguments)def foo(a, b, /, c, d):強制呼叫函數時傳入a和b一定不能寫a=和b=(positional arguments),只能照順序把參數傳進去python 3.8 以後才有,所以這個 Jupyter 環境目前沒有:
[9]:
import sys
print(sys.version)
3.7.8 | packaged by conda-forge | (default, Nov 27 2020, 19:24:58)
[GCC 9.3.0]
Immutability and Hashing
-
immutable object 初始化之後就不能改變了,mutable 的可以
mutable object 例如 list 預設不能 hash。可以自己寫
__hash__,如果 list 的內容被改變了 hash 也要跟著變
primitive types 之中 mutable 的只有
dict,list跟set。下面的表來自這個 medium post
if __name__ == "__main__":
當一個 script 被當作 entry point 執行時
__name__會被設成"__main__"如果是被當作 module include,
__name__會被設成該 script 的檔名
Context Manager (the with Statement)
The following are equivalent:
[ ]:
# try block
SET_THINGS_UP
try:
DO_SOMETHING
finally:
TEAR_THINGS_DOWN
# with statement
class controlled_execution:
def __enter__(self):
SET_THINGS_UP
return THING
def __exit__(self, exc_type, exc_value, traceback):
TEAR_THINGS_DOWN
with controlled_execution as THING:
SOME_CODE
controlled_execution is a context manager class which implements
__enter__()and__exit__(). The return value of__enter__(), if provided, is assigned to the variable followed byas
[10]:
import numpy as np
class fix_seed:
def __init__(self, seed=0):
self.seed = seed
def __enter__(self):
np.random.seed(self.seed)
def __exit__(self, exc_type=None, exc_value=None, traceback=None):
np.random.seed()
with fix_seed(seed=0):
print(np.random.uniform())
print(np.random.uniform())
0.5488135039273248
0.9087389795050141
itertools.groupby
在把 data 丟進 groupby 裡之前必需是已經 sorted by key
key 的用法和 sorted 一模一樣
迴圈裡的每一個 g 都是 iterator
[6]:
import itertools
data = 'trust but verify by unittest'
keyfunc = None
groups = []
uniquekeys = []
data = sorted(data, key=keyfunc)
for k, g in itertools.groupby(data, keyfunc):
groups.append(list(g)) # store group iterator as a list
uniquekeys.append(k)
from pprint import pprint
print('keys: ', uniquekeys)
print('groups: ')
pprint(groups)
keys: [' ', 'b', 'e', 'f', 'i', 'n', 'r', 's', 't', 'u', 'v', 'y']
groups:
[[' ', ' ', ' ', ' '],
['b', 'b'],
['e', 'e'],
['f'],
['i', 'i'],
['n'],
['r', 'r'],
['s', 's'],
['t', 't', 't', 't', 't', 't'],
['u', 'u', 'u'],
['v'],
['y', 'y']]
collections
deque
用 doubly linked list 寫成,左右兩端插入刪除都是 O(1)
也有 circular array implementation
用兩個 index 指向兩端,因為是 circular array 所以沒有 index out of range 的問題
如果 push 到空間不夠就 resize,向系統要更多空間重抄一次(takes \(O(n)\) operations)重抄的時候用 0 當 start index
Counter
Dictionary of element frequencies of a list
[4]:
import collections
collections.Counter('aaabbccccddeffffg')
[4]:
Counter({'a': 3, 'b': 2, 'c': 4, 'd': 2, 'e': 1, 'f': 4, 'g': 1})
OrderedDict
記得輸入順序的 dict,比 dict 多了兩個 method:
popitem(last=True)move_to_end(key, last=True)
last=True代表要 pop 最後一個 item,move_to_end 也是,如果用last=False變成 pop 第一個 item 和 move 到 beginningImplementation 是用 doubly linked list 來維持順序,再用 hash table 記下指標指向對應的 node
可以用來寫 LRU Cache
[2]:
import collections
d = collections.OrderedDict()
d['e'] = 5
d['a'] = 1
d['b'] = 2
print(d)
d.move_to_end('a')
print(d)
OrderedDict([('e', 5), ('a', 1), ('b', 2)])
OrderedDict([('e', 5), ('b', 2), ('a', 1)])
OrderedDict and dict
從 3.7 開始 Python 會記住
dict元素插入的順序,用的是 doubly linked list + hash table現在
OrderedDict和dict一樣,而且增刪查改都一樣是 O(1)但
OrderedDict還在,因為Backward compatibility, legacy code
用
OrderedDict讀起來比較 explicitOrderedDict有popitem(last=True)和move_to_end(key, last=True),dict沒有。有時候會需要這些操作dict.popitem()只 pop 第一個,不能指定 pop 最後一個
d1==d2用dict只比 key,在OrderedDictkey 和 value 都比
defaultdict
access 沒加過的 key 也不會有 key error 而是回傳 default value
要輸入一個 callable 當作 default_factory 例如
collections.defaultdict(int),default value 是這個 callable 的傳回值
Functional Programming
Pure function on immutable data
如果用 mutable 到 multithread 的時候就要擔心同步問題
不要用 list of dictionaries,用 tuple of collections.namedtuple,完全 immutable
Pure function:每次執行結果都一樣,no access to global states,也不能改變 input(即使是 mutable)
Higher Order Functions
filter(function, iterable)map(function, iterable, ...)functools.partial(func, /, *args, **keywords)functools.reduce(function, iterable[, initializer])
helper functions
zip(*iterables)any, all
enumerate
sort
itertools:一些常用的 iterator
其實用 list comprehension 就可以取代 filter 和 map 了
Exception
Built-in exception class 在執行環境中一啟動就已經載入了,無需另外 import
[ ]:
try:
pass
except ValueError as error: # 如果抓到 ValueError 就跑這裡
pass
except TypeError as error: # 如果抓到 TypeError 就跑這裡
pass
except Exception as error: # 任何其它 Exception 跑這裡。越 general 的要放越下面
pass
else: # 完全沒抓到 Exception 就跑這裡
pass
finally: # 不管有沒有 Exception 都會跑到這裡
pass
[7]:
try:
f = open('circles_.py')
except FileNotFoundError as e:
print(e)
else:
print(f.readline())
f.close()
finally:
print('Done!')
[Errno 2] No such file or directory: 'circles_.py'
Done!
unittest
Socratica video 8 分鐘極簡版
unittest test case methods 名稱一定要以 test 開頭,但 module 名稱不限
跑 unittest:可以指定 module 也可以不指定(
m是當作 module 來跑)python -m unittest test_circles.pypython -m unittest test_circlespython -m unittest
不指定時 python 用 test discovery 抓所有名稱以 test 開頭的 test case method 來跑
如果在 test_circles.py 裡加這個就可以直接
python test_circles.py
if __name__ == '__main__':
unittest.main()
misc/pycircle裡有 minimum python module with unittests,可以在 misc/ 下跑python -m unittest每次 library 在使用中出錯時,修好後應該去對應的地方加一個相關的 test 保證以後不再出現同樣的錯
setUp和tearDownsetUp在每一次 test case method 開始前先執行tearDown在每一次 test case method 結束後執行setUpClass在所有 test case method 開始前先執行一次tearDownClass在所有 test case method 結束前執行一次
所有 test case method 不一定會照順序執行,所以他們之間一定要獨立
一個 test script 的
if __name__=='__main__':裡面是寫unittest.main()。看 unittest doc
[10]:
# circles.py
from math import pi
def circle_area(r):
if type(r) not in [int, float]:
raise TypeError("The radius must be a non-negative real number.")
if r < 0:
raise ValueError("The radius cannot be negative")
return pi*(r**2)
[ ]:
# test_circles.py
import unittest
from circles import circle_area
from math import pi
class TestCircleArea(unittest.TestCase):
@classmethod
def setUpClass(cls):
pass
@classmethod
def tearDownClass(cls):
pass
def setUp(self):
pass
def tearDown(self):
pass
def test_area(self):
# Test areas when radius >= 0
self.assertAlmostEqual(circle_area(1), pi)
self.assertAlmostEqual(circle_area(0), 0)
self.assertAlmostEqual(circle_area(2.1), pi*(2.1**2))
def test_values(self):
# Make sure value erros are raised when necessary
self.assertRaises(ValueError, circle_area, -2) # 寫法一
with self.assertRaises(ValueError): # 寫法二,可以正常呼叫函數
circle_area(-2)
def test_types(self):
# Make sure type errors are raised when necessary
self.assertRaises(TypeError, circle_area, 3+5j)
self.assertRaises(TypeError, circle_area, True)
self.assertRaises(TypeError, circle_area, "radius")
if __name__ == '__main__':
unittest.main()
[14]:
# this works because the scripts are here
!python -m unittest test_circles
!python -m unittest
...
----------------------------------------------------------------------
Ran 3 tests in 0.000s
OK
...
----------------------------------------------------------------------
Ran 3 tests in 0.000s
OK
PEP8 Naming Styles
Type |
Style |
|---|---|
MyClass |
PascalCase |
MY_CONST |
CAPITAL_SNAKE_CASE |
mypackage |
likethis |
everything_else |
snake_case |
OOP
Sample Program
重覆默寫這段 code 直到覺得自然為止
[5]:
class Person:
def __init__(self, name):
self.name = name
def reveal_identity(self):
print(f"My name is {self.name}.")
class SuperHero(Person):
def __init__(self, name, hero_name):
super().__init__(name)
self.hero_name = hero_name
def reveal_identity(self):
super().reveal_identity()
print(f"And I'm {self.hero_name}.")
corey = Person('Corey')
corey.reveal_identity()
wade = SuperHero('Wade Wilson', 'Deadpool')
wade.reveal_identity()
My name is Corey.
My name is Wade Wilson.
And I'm Deadpool.
classmethod and staticmethod
staticmethods don’t have access to anything. A good use is to group util functions
[5]:
class Employee:
num_of_emps = 0
raise_amt = 1.04
def __init__(self, first, last, pay):
self.first = first
self.last = last
self.email = first + '.' + last + '@email.com'
self.pay = pay
Employee.num_of_emps += 1
def fullname(self):
return '{} {}'.format(self.first, self.last)
def apply_raise(self):
self.pay = int(self.pay * self.raise_amt)
@classmethod
def set_raise_amt(cls, amount):
cls.raise_amt = amount
@classmethod
def from_string(cls, emp_str):
first, last, pay = emp_str.split('-')
return cls(first, last, pay)
@staticmethod
def is_workday(day):
if day.weekday() == 5 or day.weekday() == 6:
return False
return True
emp_1 = Employee('Corey', 'Schafer', 50000)
emp_2 = Employee('Test', 'Employee', 60000)
Employee.set_raise_amt(1.05)
print(Employee.raise_amt)
print(emp_1.raise_amt)
print(emp_2.raise_amt)
emp_str_1 = 'John-Doe-70000'
emp_str_2 = 'Steve-Smith-30000'
emp_str_3 = 'Jane-Doe-90000'
first, last, pay = emp_str_1.split('-')
#new_emp_1 = Employee(first, last, pay)
new_emp_1 = Employee.from_string(emp_str_1)
print(new_emp_1.email)
print(new_emp_1.pay)
import datetime
my_date = datetime.date(2016, 7, 11)
print(Employee.is_workday(my_date))
1.05
1.05
1.05
John.Doe@email.com
70000
True
Inheritance
[6]:
class Employee:
raise_amt = 1.04
def __init__(self, first, last, pay):
self.first = first
self.last = last
self.email = first + '.' + last + '@email.com'
self.pay = pay
def fullname(self):
return '{} {}'.format(self.first, self.last)
def apply_raise(self):
self.pay = int(self.pay * self.raise_amt)
class Developer(Employee):
raise_amt = 1.10
def __init__(self, first, last, pay, prog_lang):
super().__init__(first, last, pay)
self.prog_lang = prog_lang
class Manager(Employee):
def __init__(self, first, last, pay, employees=None):
super().__init__(first, last, pay)
if employees is None:
self.employees = []
else:
self.employees = employees
def add_emp(self, emp):
if emp not in self.employees:
self.employees.append(emp)
def remove_emp(self, emp):
if emp in self.employees:
self.employees.remove(emp)
def print_emps(self):
for emp in self.employees:
print('-->', emp.fullname())
dev_1 = Developer('Corey', 'Schafer', 50000, 'Python')
dev_2 = Developer('Test', 'Employee', 60000, 'Java')
mgr_1 = Manager('Sue', 'Smith', 90000, [dev_1])
print(mgr_1.email)
mgr_1.add_emp(dev_2)
mgr_1.remove_emp(dev_2)
mgr_1.print_emps()
Sue.Smith@email.com
--> Corey Schafer
Special Methods
[7]:
class Employee:
raise_amt = 1.04
def __init__(self, first, last, pay):
self.first = first
self.last = last
self.email = first + '.' + last + '@email.com'
self.pay = pay
def fullname(self):
return '{} {}'.format(self.first, self.last)
def apply_raise(self):
self.pay = int(self.pay * self.raise_amt)
def __repr__(self):
return "Employee('{}', '{}', {})".format(self.first, self.last, self.pay)
def __str__(self):
return '{} - {}'.format(self.fullname(), self.email)
def __add__(self, other):
return self.pay + other.pay
def __len__(self):
return len(self.fullname())
emp_1 = Employee('Corey', 'Schafer', 50000)
emp_2 = Employee('Test', 'Employee', 60000)
# print(emp_1 + emp_2)
print(len(emp_1))
13
Property Decorators - Getters, Setters, and Deleters
[8]:
class Employee:
def __init__(self, first, last):
self.first = first
self.last = last
@property
def email(self):
return '{}.{}@email.com'.format(self.first, self.last)
@property
def fullname(self):
return '{} {}'.format(self.first, self.last)
@fullname.setter
def fullname(self, name):
first, last = name.split(' ')
self.first = first
self.last = last
@fullname.deleter
def fullname(self):
print('Delete Name!')
self.first = None
self.last = None
emp_1 = Employee('John', 'Smith')
emp_1.fullname = "Corey Schafer"
print(emp_1.first)
print(emp_1.email)
print(emp_1.fullname)
del emp_1.fullname
Corey
Corey.Schafer@email.com
Corey Schafer
Delete Name!
Generator
Difference between iterators and generators
iterator is any object of a class that has
__next__and__iter__methods (___iter___returns self)generator is a function that has
yielditerator 是比較廣的概念(any generator is an iterator but not vice versa)generator 寫起來比較快,但 iterator 有 class 可以客製很多不同的行為
x**2 for x in range(100) if x%2 == 1是一個 generator expression
[10]:
def pow2():
n = 2
while n < 1000:
yield n
n *= 2
print([i for i in pow2()])
a = pow2()
print(next(a))
print(next(a))
print(next(a))
[2, 4, 8, 16, 32, 64, 128, 256, 512]
2
4
8
Coroutine
[6]:
# 呼叫 next() 時會跑到 coro 裡的下一個 yield
# 然後可以用 send 把值傳進正在跑的函數裡,同時 send 也會 return yield 的結果
def coro():
step = 0
while True:
received = yield step
step += 1
print(f'Received: {received}')
c = coro()
next(c) # important! get to the first yield
print(c.send(100))
Received: 100
1
Decorator
寫的很好的 RealPython tutorial,整篇看完了但沒時間作筆記
被 decorate 過的函數呼叫
.__name__或.__doc__(help())的時候會叫到 wrapper 的,所以才需要用@functools.wraps(func)把 func 的 name 和 docstring 抄給 wrapper@debug印下函數的 input/output,可以用寫 recursive 的時候 debugclasses as decorators,implement
__init__和__call__,可以存狀態,例如 lru_cache
General Pattern (No Argument)
[7]:
import functools
def decorator(func):
@functools.wraps(func)
def wrapper_decorator(*args, **kwargs):
# Do something before
value = func(*args, **kwargs)
# Do something after
return value
return wrapper_decorator
Decorator fix_seed
[367]:
# fix_seed:固定 seed = 0 版本。離開函數 seed 會還原成 None
import numpy as np
import functools
def fix_seed(fnc):
@functools.wraps(fnc)
def wrapper_fix_seed(*args, **kargs):
np.random.seed(0)
res = fnc(*args, **kargs)
np.random.seed()
return res
return wrapper_fix_seed
@fix_seed
def printRand():
print(np.random.uniform())
printRand()
print(np.random.uniform())
0.5488135039273248
0.6161167995056092
[377]:
# 接受 argument 版本,但變成一定要指定 seed
import numpy as np
import functools
def fix_seed(seed=0):
def decorator_fix_seed(fnc):
@functools.wraps(fnc)
def wrapper_fix_seed(*args, **kargs):
np.random.seed(seed)
res = fnc(*args, **kargs)
np.random.seed()
return res
return wrapper_fix_seed
return decorator_fix_seed
@fix_seed(100)
def printRand():
print(np.random.uniform())
printRand()
print(np.random.uniform())
0.5434049417909654
0.3289099673526439
[6]:
# 可以指定也可以不指定。若不指定 seed 預設為 0。若要指定一定要寫 seed=
# 有指定 seed 的時候相當於 printRand = fix_seed(seed=0)(printRand),所以 _func 是 None
# 不指定 seed 的時候則變成 printRand = fix_seed(printRand) 把 function 傳進去
import numpy as np
import functools
def fix_seed(_func=None, *, seed=0):
def decorator_fix_seed(func):
@functools.wraps(func)
def wrapper_fix_seed(*args, **kwargs):
np.random.seed(seed)
res = func(*args, **kwargs)
np.random.seed()
return res
return wrapper_fix_seed
if _func:
return decorator_fix_seed(_func)
else:
return decorator_fix_seed
# @fix_seed(0) # TypeError: 'int' object is not callable
# @fix_seed(seed=0)
@fix_seed
def printRand():
print(np.random.uniform())
printRand()
print(np.random.uniform())
0.5488135039273248
0.13056825103667768