diff --git a/01-data-model/01-notes.ipynb b/01-data-model/01-notes.ipynb new file mode 100644 index 0000000..8929ed1 --- /dev/null +++ b/01-data-model/01-notes.ipynb @@ -0,0 +1,526 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Ch1 : python data model\n", + "\n", + "数据模型其实是对Python框架的描述,它规范了这门语言自身构建模块的接口,这些模块包括但不限于序列、迭代器、函数、类和上下文管理器。\n", + "\n", + "magic methods是python中的特殊方法,它们的名字以双下划线开头和结尾,比如`__getitem__`。这些方法是python解释器调用的,而不是我们自己调用的。\n", + "\n", + "- 迭代\n", + "- 集合类\n", + "- 属性访问\n", + "- 运算符重载\n", + "- 函数和方法的调用\n", + "- 对象的创建和销毁\n", + "- 字符串表示形式和格式化\n", + "- 管理上下文 (aka `with` block)\n", + "\n", + "## 1.1 一摞Python风格的纸牌\n", + "\n", + "How to use `__getitem__` and `__len__` methods" + ] + }, + { + "cell_type": "code", + "execution_count": 41, + "metadata": {}, + "outputs": [], + "source": [ + "# frenchdeck.py\n", + "import collections\n", + "\n", + "Card = collections.namedtuple('Card', ['rank', 'suit'])\n", + "\n", + "class FrenchDeck:\n", + " ranks = [str(n) for n in range(2, 11)] + list('JQKA')\n", + " suits = 'spades diamonds clubs hearts'.split()\n", + "\n", + " def __init__(self):\n", + " self._cards = [Card(rank, suit) for suit in self.suits\n", + " for rank in self.ranks]\n", + "\n", + " def __len__(self):\n", + " return len(self._cards)\n", + "\n", + " def __getitem__(self, position):\n", + " return self._cards[position]\n", + " \n", + " # 这一方法破坏了类的封装性使得我们可以直接对_cards进行操作,这个类变为mutable对象\n", + " def __setitem__(self, position, value):\n", + " self._cards[position] = value" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "`collections.namedtuple` is a factory function for creating simple tuple subclasses with named fields (no methods required). Like database entries." + ] + }, + { + "cell_type": "code", + "execution_count": 42, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "Card(rank='7', suit='diamonds')" + ] + }, + "execution_count": 42, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "beer_card = Card('7','diamonds')\n", + "beer_card" + ] + }, + { + "cell_type": "code", + "execution_count": 43, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "52" + ] + }, + "execution_count": 43, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "deck = FrenchDeck()\n", + "len(deck)" + ] + }, + { + "cell_type": "code", + "execution_count": 44, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Card(rank='2', suit='spades')\n", + "Card(rank='A', suit='hearts')\n" + ] + } + ], + "source": [ + "print(deck[0])\n", + "print(deck[-1])" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "`len()` and `[]` will invoke `__len__` and `__getitem__` methods respectively.\n", + "\n", + "What's more, there is no need to implement a new method for random choice, just use `random.choice()`. Because it will call `__getitem__` method." + ] + }, + { + "cell_type": "code", + "execution_count": 45, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Card(rank='3', suit='hearts')\n", + "Card(rank='7', suit='spades')\n" + ] + } + ], + "source": [ + "from random import choice\n", + "print(choice(deck))\n", + "print(choice(deck))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "since `__getitem__` gives `[]` operation to `self._cards`, the deck class support slicing operation. Also, it supports iteration." + ] + }, + { + "cell_type": "code", + "execution_count": 46, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Card(rank='A', suit='hearts')\n", + "Card(rank='K', suit='hearts')\n", + "Card(rank='Q', suit='hearts')\n", + "Card(rank='J', suit='hearts')\n", + "Card(rank='10', suit='hearts')\n", + "Card(rank='9', suit='hearts')\n", + "Card(rank='8', suit='hearts')\n", + "Card(rank='7', suit='hearts')\n", + "Card(rank='6', suit='hearts')\n", + "Card(rank='5', suit='hearts')\n", + "Card(rank='4', suit='hearts')\n", + "Card(rank='3', suit='hearts')\n", + "Card(rank='2', suit='hearts')\n", + "Card(rank='A', suit='clubs')\n", + "Card(rank='K', suit='clubs')\n", + "Card(rank='Q', suit='clubs')\n", + "Card(rank='J', suit='clubs')\n", + "Card(rank='10', suit='clubs')\n", + "Card(rank='9', suit='clubs')\n", + "Card(rank='8', suit='clubs')\n", + "Card(rank='7', suit='clubs')\n", + "Card(rank='6', suit='clubs')\n", + "Card(rank='5', suit='clubs')\n", + "Card(rank='4', suit='clubs')\n", + "Card(rank='3', suit='clubs')\n", + "Card(rank='2', suit='clubs')\n", + "Card(rank='A', suit='diamonds')\n", + "Card(rank='K', suit='diamonds')\n", + "Card(rank='Q', suit='diamonds')\n", + "Card(rank='J', suit='diamonds')\n", + "Card(rank='10', suit='diamonds')\n", + "Card(rank='9', suit='diamonds')\n", + "Card(rank='8', suit='diamonds')\n", + "Card(rank='7', suit='diamonds')\n", + "Card(rank='6', suit='diamonds')\n", + "Card(rank='5', suit='diamonds')\n", + "Card(rank='4', suit='diamonds')\n", + "Card(rank='3', suit='diamonds')\n", + "Card(rank='2', suit='diamonds')\n", + "Card(rank='A', suit='spades')\n", + "Card(rank='K', suit='spades')\n", + "Card(rank='Q', suit='spades')\n", + "Card(rank='J', suit='spades')\n", + "Card(rank='10', suit='spades')\n", + "Card(rank='9', suit='spades')\n", + "Card(rank='8', suit='spades')\n", + "Card(rank='7', suit='spades')\n", + "Card(rank='6', suit='spades')\n", + "Card(rank='5', suit='spades')\n", + "Card(rank='4', suit='spades')\n", + "Card(rank='3', suit='spades')\n", + "Card(rank='2', suit='spades')\n" + ] + } + ], + "source": [ + "for card in reversed(deck):\n", + " print(card)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Iteration is usually implicit. If a collection has no `__contains__` method, the `in` operator does a sequential scan. " + ] + }, + { + "cell_type": "code", + "execution_count": 47, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "True" + ] + }, + "execution_count": 47, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "Card('Q', 'hearts') in deck " + ] + }, + { + "cell_type": "code", + "execution_count": 48, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "False" + ] + }, + "execution_count": 48, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "Card('7', 'beasts') in deck " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Next, we will implement sorting" + ] + }, + { + "cell_type": "code", + "execution_count": 49, + "metadata": {}, + "outputs": [], + "source": [ + "suit_values = dict(spades = 3, hearts = 2, diamonds = 1, clubs = 0)\n", + "\n", + "def spades_high(card):\n", + " rank_value = FrenchDeck.ranks.index(card.rank)\n", + " return rank_value * len(suit_values) + suit_values[card.suit]" + ] + }, + { + "cell_type": "code", + "execution_count": 50, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Card(rank='4', suit='clubs')\n", + "Card(rank='9', suit='spades')\n", + "Card(rank='A', suit='spades')\n", + "Card(rank='9', suit='hearts')\n", + "Card(rank='2', suit='spades')\n", + "Card(rank='10', suit='diamonds')\n", + "Card(rank='4', suit='spades')\n", + "Card(rank='10', suit='clubs')\n", + "Card(rank='3', suit='clubs')\n", + "Card(rank='2', suit='diamonds')\n", + "Card(rank='K', suit='spades')\n", + "Card(rank='Q', suit='diamonds')\n", + "Card(rank='A', suit='hearts')\n", + "Card(rank='A', suit='diamonds')\n", + "Card(rank='5', suit='spades')\n", + "Card(rank='10', suit='hearts')\n", + "Card(rank='8', suit='hearts')\n", + "Card(rank='8', suit='spades')\n", + "Card(rank='6', suit='spades')\n", + "Card(rank='3', suit='diamonds')\n", + "Card(rank='2', suit='clubs')\n", + "Card(rank='3', suit='spades')\n", + "Card(rank='6', suit='clubs')\n", + "Card(rank='5', suit='diamonds')\n", + "Card(rank='J', suit='clubs')\n", + "Card(rank='4', suit='diamonds')\n", + "Card(rank='7', suit='spades')\n", + "Card(rank='9', suit='clubs')\n", + "Card(rank='J', suit='spades')\n", + "Card(rank='10', suit='spades')\n", + "Card(rank='Q', suit='hearts')\n", + "Card(rank='4', suit='hearts')\n", + "Card(rank='K', suit='hearts')\n", + "Card(rank='Q', suit='spades')\n", + "Card(rank='5', suit='clubs')\n", + "Card(rank='7', suit='diamonds')\n", + "Card(rank='Q', suit='clubs')\n", + "Card(rank='7', suit='clubs')\n", + "Card(rank='6', suit='hearts')\n", + "Card(rank='J', suit='diamonds')\n", + "Card(rank='9', suit='diamonds')\n", + "Card(rank='6', suit='diamonds')\n", + "Card(rank='5', suit='hearts')\n", + "Card(rank='8', suit='diamonds')\n", + "Card(rank='8', suit='clubs')\n", + "Card(rank='7', suit='hearts')\n", + "Card(rank='K', suit='clubs')\n", + "Card(rank='K', suit='diamonds')\n", + "Card(rank='J', suit='hearts')\n", + "Card(rank='2', suit='hearts')\n", + "Card(rank='A', suit='clubs')\n", + "Card(rank='3', suit='hearts')\n", + "Card(rank='2', suit='clubs')\n", + "Card(rank='2', suit='diamonds')\n", + "Card(rank='2', suit='hearts')\n", + "Card(rank='2', suit='spades')\n", + "Card(rank='3', suit='clubs')\n", + "Card(rank='3', suit='diamonds')\n", + "Card(rank='3', suit='hearts')\n", + "Card(rank='3', suit='spades')\n", + "Card(rank='4', suit='clubs')\n", + "Card(rank='4', suit='diamonds')\n", + "Card(rank='4', suit='hearts')\n", + "Card(rank='4', suit='spades')\n", + "Card(rank='5', suit='clubs')\n", + "Card(rank='5', suit='diamonds')\n", + "Card(rank='5', suit='hearts')\n", + "Card(rank='5', suit='spades')\n", + "Card(rank='6', suit='clubs')\n", + "Card(rank='6', suit='diamonds')\n", + "Card(rank='6', suit='hearts')\n", + "Card(rank='6', suit='spades')\n", + "Card(rank='7', suit='clubs')\n", + "Card(rank='7', suit='diamonds')\n", + "Card(rank='7', suit='hearts')\n", + "Card(rank='7', suit='spades')\n", + "Card(rank='8', suit='clubs')\n", + "Card(rank='8', suit='diamonds')\n", + "Card(rank='8', suit='hearts')\n", + "Card(rank='8', suit='spades')\n", + "Card(rank='9', suit='clubs')\n", + "Card(rank='9', suit='diamonds')\n", + "Card(rank='9', suit='hearts')\n", + "Card(rank='9', suit='spades')\n", + "Card(rank='10', suit='clubs')\n", + "Card(rank='10', suit='diamonds')\n", + "Card(rank='10', suit='hearts')\n", + "Card(rank='10', suit='spades')\n", + "Card(rank='J', suit='clubs')\n", + "Card(rank='J', suit='diamonds')\n", + "Card(rank='J', suit='hearts')\n", + "Card(rank='J', suit='spades')\n", + "Card(rank='Q', suit='clubs')\n", + "Card(rank='Q', suit='diamonds')\n", + "Card(rank='Q', suit='hearts')\n", + "Card(rank='Q', suit='spades')\n", + "Card(rank='K', suit='clubs')\n", + "Card(rank='K', suit='diamonds')\n", + "Card(rank='K', suit='hearts')\n", + "Card(rank='K', suit='spades')\n", + "Card(rank='A', suit='clubs')\n", + "Card(rank='A', suit='diamonds')\n", + "Card(rank='A', suit='hearts')\n", + "Card(rank='A', suit='spades')\n" + ] + } + ], + "source": [ + "from random import shuffle\n", + "shuffle(deck) # also used __getitem__\n", + "\n", + "for card in deck:\n", + " print(card)\n", + "\n", + "for card in sorted(deck, key = spades_high):\n", + " print(card)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 1.2 如何使用特殊方法\n", + "特殊方法的存在是为了被Python解释器调用的,你自己并不需要调用它们。也就是说没有`my_object.__len__()` 这种写法,而应该使用`len(my_object)`。\n", + "\n", + "通过内置的函数(例如len、iter、str,等等)来使用特殊方法是最好的选择。这些内置函数不仅会调用特殊方法,通常还提供额外的好处,而且对于内置的类来说,它们的速度更快。" + ] + }, + { + "cell_type": "code", + "execution_count": 51, + "metadata": {}, + "outputs": [], + "source": [ + "# vector2d.py\n", + "from math import hypot\n", + "\n", + "class Vector:\n", + "\n", + " def __init__(self, x=0, y=0):\n", + " self.x = x\n", + " self.y = y\n", + "\n", + " def __repr__(self):\n", + " return 'Vector(%r, %r)' % (self.x, self.y)\n", + "\n", + " def __abs__(self):\n", + " return hypot(self.x, self.y)\n", + "\n", + " def __bool__(self):\n", + " return bool(abs(self))\n", + "\n", + " def __add__(self, other):\n", + " x = self.x + other.x\n", + " y = self.y + other.y\n", + " return Vector(x, y)\n", + "\n", + " def __mul__(self, scalar):\n", + " return Vector(self.x * scalar, self.y * scalar)\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### 1.2.2 `__repr__` 字符串表示形式\n", + "\n", + "很像java的`toString()`方法,用于返回一个对象的字符串表示形式。\n", + "\n", + "`__repr__` 和 `__str__` 的区别在于,后者是在str()函数被使用,或是在用print函数打印一个对象的时候才被调用的,并且它返回的字符串对终端用户更友好。\n", + "\n", + "### 1.2.4 自定义的布尔值\n", + "尽管Python 里有bool类型,但实际上任何对象都可以用于需要布尔值的上下文中(比如if 或while 语句,或者and、or和not运算符)。默认情况下,我们自己定义的类返回的实例总被认为是True\n", + "\n", + "`vector.__bool__`也可以改写为以下形式:\n", + "```python\n", + "def __bool__(self):\n", + " return bool(self.x or self.y)\n", + "```" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# 1.4 为什么len不是普通方法\n", + "`len`是一个内置函数,而不是一个普通方法,这是为了让Python自带的数据结构可以走后门,直接从底层的C语言中获得长度,速度非常快。\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "*延申阅读*\n", + "\n", + "元对象" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "base", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.10.9" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/02-array-seq/02-notes.ipynb b/02-array-seq/02-notes.ipynb new file mode 100644 index 0000000..8a5aaae --- /dev/null +++ b/02-array-seq/02-notes.ipynb @@ -0,0 +1,1415 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Ch2 序列构成的数组\n", + "\n", + "## 2.1 内置序列类型概览\n", + "\n", + "容器序列\n", + "- `list`, `tuple`, `collections.deque` 可存放不同类型的数据\n", + "\n", + "扁平序列\n", + "- `str`, `bytes`, `bytearray`, `memoryview`, `array.array` 只能容纳一种类型\n", + "\n", + "容器序列存放的是引用,而扁平序列存放的是值。扁平序列是一段连续的内存空间。\n", + "\n", + "可变序列\n", + "- `list`, `bytearray`, `array.array`, `collections.deque`, `memoryview`\n", + "\n", + "不可变序列\n", + "- `tuple`, `str`, `bytes`\n", + "\n", + "可变序列从不可变序列处继承了一些方法\n", + "\n", + "继承树如下:\n", + "\n", + "`Container`类\n", + "- `__contains__`\n", + "\n", + "`Iterable`类\n", + "- `__iter__`\n", + "\n", + "`Sized`类\n", + "- `__len__`\n", + "\n", + "`Sequence`类 extends `Container`, `Iterable`, `Sized`\n", + "- `__getitem__`\n", + "- `__contains__`\n", + "- `__iter__`\n", + "- `__reversed__`\n", + "- `index`\n", + "- `count`\n", + "\n", + "`MutableSequence`类 extends `Sequence`\n", + "- `__setitem__`\n", + "- `__delitem__`\n", + "- `insert`\n", + "- `append`\n", + "- `reverse`\n", + "- `extend`\n", + "- `pop`\n", + "- `remove`\n", + "- `__iadd__`\n", + "\n", + "\n", + "## 2.2 列表推导和生成器表达式\n", + "通常的原则是,只用列表推导来创建新的列表,并且尽量保持简短。超过了两行的话,就要考虑是不是应该用for循环重写。" + ] + }, + { + "cell_type": "code", + "execution_count": 40, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "[162, 163, 165, 8364, 164]\n" + ] + } + ], + "source": [ + "symbols = '$¢£¥€¤'\n", + "beyond_ascii = [ord(s) for s in symbols if ord(s) > 127]\n", + "print(beyond_ascii)" + ] + }, + { + "cell_type": "code", + "execution_count": 41, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "[162, 163, 165, 8364, 164]\n" + ] + } + ], + "source": [ + "# or using map/filter\n", + "beyond_ascii = list(filter(lambda c : c > 127, map(ord, symbols)))\n", + "print(beyond_ascii)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Comparison of speed can be found in *listcomp_speed.py*\n", + "\n", + "### 2.2.3 cartesian product" + ] + }, + { + "cell_type": "code", + "execution_count": 42, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "[('black', 'S'), ('black', 'M'), ('black', 'L'), ('white', 'S'), ('white', 'M'), ('white', 'L')]\n" + ] + } + ], + "source": [ + "colors = ['black', 'white'] \n", + "sizes = ['S', 'M', 'L'] \n", + "tshirts = [(color, size) for color in colors for size in sizes]\n", + "print(tshirts)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### 2.2.4 生成器表达式\n", + "生成器表达式是懒惰的,只有在需要的时候才会生成值,这样有助于节省内存。生成器表达式的语法和列表推导很像,只不过把中括号换成了圆括号。" + ] + }, + { + "cell_type": "code", + "execution_count": 43, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "(36, 162, 163, 165, 8364, 164)" + ] + }, + "execution_count": 43, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "tuple(ord(symbol) for symbol in symbols)" + ] + }, + { + "cell_type": "code", + "execution_count": 44, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "array('I', [36, 162, 163, 165, 8364, 164])\n" + ] + } + ], + "source": [ + "import array\n", + "arr = array.array('I', (ord(symbol) for symbol in symbols))\n", + "print(arr)\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "1. 如果生成器表达式是一个函数调用过程中的唯一参数,那么不需要额外的括号\n", + "2. array的构造方法需要两个参数,第一个指定了数组中数字的储存方式,第二个是可迭代对象" + ] + }, + { + "cell_type": "code", + "execution_count": 45, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "black S\n", + "black M\n", + "black L\n", + "white S\n", + "white M\n", + "white L\n" + ] + } + ], + "source": [ + "colors = ['black','white']\n", + "sizes = ['S','M','L']\n", + "for tshirt in ('%s %s' % (c,s) for c in colors for s in sizes):\n", + " print(tshirt)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 2.3 元组不仅仅是不可变的列表\n", + "除了用作不可变的列表,它还可以用于没有字段名的记录\n", + "\n", + "### 2.3.1 元组和记录\n", + "\n", + "如果只把元组理解为不可变的列表,那其他信息——它所含有的元素的总数和它们的位置——似乎就变得可有可无。但是如果把元组当作一些字段的集合,那么**数量和位置信息**就变得非常重要了。\n", + "\n", + "### 2.3.2 元组拆包\n", + "可以参考Python Techniques系列的笔记\n" + ] + }, + { + "cell_type": "code", + "execution_count": 46, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "(2, 4)\n" + ] + } + ], + "source": [ + "# * unpacks an iterable\n", + "t = (20,8)\n", + "print(divmod(*t))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "parallel assignment technique" + ] + }, + { + "cell_type": "code", + "execution_count": 47, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "0 [1, 2] 3 4\n" + ] + } + ], + "source": [ + "a, *body, c, d = range(5)\n", + "print(a,body,c,d)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### 2.3.3 嵌套元组拆包\n", + "接受表达式的元组可以是嵌套式的,例如(a, b, (c, d))。" + ] + }, + { + "cell_type": "code", + "execution_count": 48, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + " | lat. | long. \n", + "Mexico City | 19.4333 | -99.1333\n", + "New York-Newark | 40.8086 | -74.0204\n", + "Sao Paulo | -23.5478 | -46.6358\n" + ] + } + ], + "source": [ + "# metro_lat_long.py\n", + "metro_areas = [\n", + " ('Tokyo', 'JP', 36.933, (35.689722, 139.691667)), # <1>\n", + " ('Delhi NCR', 'IN', 21.935, (28.613889, 77.208889)),\n", + " ('Mexico City', 'MX', 20.142, (19.433333, -99.133333)),\n", + " ('New York-Newark', 'US', 20.104, (40.808611, -74.020386)),\n", + " ('Sao Paulo', 'BR', 19.649, (-23.547778, -46.635833)),\n", + "]\n", + "\n", + "print('{:15} | {:^9} | {:^9}'.format('', 'lat.', 'long.'))\n", + "fmt = '{:15} | {:9.4f} | {:9.4f}'\n", + "for name, cc, pop, (latitude, longitude) in metro_areas: # <2>\n", + " if longitude <= 0: # <3>\n", + " print(fmt.format(name, latitude, longitude))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### 2.3.4 具名元组(namedtuple)\n", + "collections.namedtuple是一个**工厂函数**,它可以用来构建一个带字段名的元组和一个有名字的类——这个带名字的类对调试程序有很大帮助。\n", + "\n", + "*拓展:工厂函数*\n", + "\n", + "In programming, a factory function is a concept used primarily in object-oriented programming. It refers to a function that is designed to create and return new instances of objects. Unlike constructors that are associated with a specific class and are used to create instances of that class, factory functions can be more flexible. They can create objects from multiple classes based on the parameters passed to them or based on specific conditions.\n", + "\n", + "Factory functions are useful for several reasons:\n", + "1. **Abstraction and Encapsulation**: They can hide the complexity of creating instances of complex objects, making the code that uses these objects simpler and cleaner.\n", + "2. **Flexibility**: Since factory functions are not tied to specific classes, they can return instances of different classes. This makes it easier to introduce new types of objects without changing the code that uses the factory function.\n", + "3. **Customization**: Parameters passed to a factory function can dictate the customization of the created object, allowing for a more dynamic object creation process.\n", + "\n", + "Here's a simple example in JavaScript to illustrate a factory function:\n", + "\n", + "```javascript\n", + "function carFactory(model, year) {\n", + " return {\n", + " model: model,\n", + " year: year,\n", + " displayInfo: function() {\n", + " console.log(`Model: ${this.model}, Year: ${this.year}`);\n", + " }\n", + " };\n", + "}\n", + "\n", + "const car1 = carFactory('Toyota', 2020);\n", + "car1.displayInfo(); // Output: Model: Toyota, Year: 2020\n", + "```\n", + "\n", + "In this example, `carFactory` is a factory function that creates and returns a new car object each time it is called. The created car object includes properties for `model` and `year`, as well as a method `displayInfo` to display the car's information. This approach allows for the creation of car objects with different properties without the need for a specific class for each car.\n", + "\n", + "书本注:\n", + "用`namedtuple`构建的类的实例所消耗的内存和元组一样,因为字段名都被存在对应的类内。这个实例比普通的对象实例比起来要小一点,因为python不会用`__dict__`来存放这些实例的属性" + ] + }, + { + "cell_type": "code", + "execution_count": 49, + "metadata": {}, + "outputs": [], + "source": [ + "from collections import namedtuple\n", + "City = namedtuple('City','name country population coordinates')\n", + "tokyo = City('Tokyo', 'JP', 36.933, (35.689722, 139.691667))" + ] + }, + { + "cell_type": "code", + "execution_count": 50, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "City(name='Tokyo', country='JP', population=36.933, coordinates=(35.689722, 139.691667))\n", + "Tokyo\n", + "36.933\n" + ] + } + ], + "source": [ + "print(tokyo)\n", + "print(tokyo.name)\n", + "print(tokyo.population)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "*code comment:*\n", + "1. 创建`namedtuple`需要两个参数,一个是类名,另一个是类的各个字段的名字。后者可以是由数个**字符串组成的可迭代对象**,或者是由**空格分隔开的字段名组成的字符串**。\n", + "2. 存放在对应字段里的数据要以一串参数的形式传入到构造函数中(元组的构造函数却只接受单一的可迭代对象)\n", + "3. ...\n", + "\n", + "具名元组还有一些自己专有的属性:\n", + "- `_fields` class attribute\n", + "- `_make(iterable)` class method\n", + "- `_asdict()` instance method" + ] + }, + { + "cell_type": "code", + "execution_count": 51, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "('name', 'country', 'population', 'coordinates')" + ] + }, + "execution_count": 51, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "City._fields" + ] + }, + { + "cell_type": "code", + "execution_count": 52, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "{'name': 'Delhi NCR', 'country': 'IN', 'population': 21.935, 'coordinates': LatLong(lat=28.613889, long=77.208889)}\n" + ] + } + ], + "source": [ + "LatLong = namedtuple('LatLong', 'lat long') \n", + "delhi_data = ('Delhi NCR', 'IN', 21.935, LatLong(28.613889, 77.208889)) \n", + "delhi = City._make(delhi_data) \n", + "print(delhi._asdict()) " + ] + }, + { + "cell_type": "code", + "execution_count": 53, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "name: Delhi NCR\n", + "country: IN\n", + "population: 21.935\n", + "coordinates: LatLong(lat=28.613889, long=77.208889)\n" + ] + } + ], + "source": [ + "for key, value in delhi._asdict().items(): \n", + " print(key + ':', value) " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "*code comment:*\n", + "1. `_fields`属性是一个包含这个类所有字段名称的元组。\n", + "2. 用`_make()`通过接受一个可迭代对象来生成这个类的一个实例,它的作用跟`City(*delhi_data)`是一样的。\n", + "3. `_asdict()`把具名元组以`collections.OrderedDict`的形式返回,我们可以利用它来把元组里的信息清晰的呈现出来。" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### 2.3.5 作为不可变列表的元组\n", + "\n", + "除了和增减元素相关的方法之外,元组支持列表的其他所有方法。有一个例外是元组没有`__reversed__`方法 (书上描述:这个方法只是个优化) ,但是可以使用`reversed()`函数。\n", + "\n", + "注: `__reversed__` method returns an iterator of reversed items" + ] + }, + { + "cell_type": "code", + "execution_count": 54, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "(3, 2, 1)\n" + ] + } + ], + "source": [ + "a = (1,2,3)\n", + "a = reversed(a)\n", + "print(a)\n", + "print(tuple(a))" + ] + }, + { + "cell_type": "code", + "execution_count": 55, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "[3, 2, 1]\n" + ] + } + ], + "source": [ + "a = [1,2,3]\n", + "a = reversed(a)\n", + "print(a)\n", + "print(list(a))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 2.4 切片\n", + "`a:b:c` 这种用法只能作为索引或者下标用在[]中来返回一个切片对象:`slice(a, b, c)` 。对`seq[start:stop:step]` 进行求值的时候,Python会调用`seq`. \n", + "`__getitem__(slice(start, stop, step))`。\n", + "\n", + "(may refer to a video about slicing 【Python】Slice:被低估的小技巧,减少重复工作量 )\n", + "\n", + "### 2.4.3 多维切片和省略\n", + "`[]` 运算符里还可以使用以逗号分开的多个索引或者是切片, numpy库就利用了这个特性。\n", + "\n", + "要正确处理这种`[]`运算符的话,对象的特殊方法`__getitem__`和`__setitem__`需要以元组的形式来接收`a[i, j]`中的索引。也就是说,如果要得到`a[i, j]`的值,Python会调用`a.__getitem__((i, j))`\n", + "\n", + "省略`...`是Ellipsis对象的别名,它可以表示任意多的冒号\n", + "\n", + "书本注:fun fact, `Ellipsis` object is a singleton object of `ellipsis` class (Ellipsis是一个内置实例). Yes, it is a class with lower letters. Similar to `bool` class with `True` and `False` instances.\n" + ] + }, + { + "cell_type": "code", + "execution_count": 56, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "[[[ 1 2 3 4]\n", + " [ 5 6 7 8]]\n", + "\n", + " [[12 34 56 78]\n", + " [56 78 90 12]]]\n" + ] + } + ], + "source": [ + "import numpy as np\n", + "\n", + "a = np.array([[[[1,2,3,4],[5,6,7,8]],[[12,34,56,78],[56,78,90,12]]],[[[1,2,3,4],[5,6,7,8]],[[12,34,56,78],[56,78,90,12]]]])\n", + "b = a[0, ...] # very interesting example\n", + "print(b)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### 2.4.4 给切片赋值(有意思)\n", + "如果把切片放在赋值语句的左边,或把它作为`del`操作的对象,我们就可以对序列进行**嫁接、切除或就地修改**操作。" + ] + }, + { + "cell_type": "code", + "execution_count": 57, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]\n" + ] + } + ], + "source": [ + "l = list(range(10))\n", + "print(l)" + ] + }, + { + "cell_type": "code", + "execution_count": 58, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[0, 1, 20, 30, 5, 6, 7, 8, 9]" + ] + }, + "execution_count": 58, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "l[2:5] = [20,30]\n", + "l" + ] + }, + { + "cell_type": "code", + "execution_count": 59, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[0, 1, 20, 30, 5, 8, 9]" + ] + }, + "execution_count": 59, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "del l[5:7]\n", + "l" + ] + }, + { + "cell_type": "code", + "execution_count": 60, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[0, 1, 20, 11, 5, 22, 9]" + ] + }, + "execution_count": 60, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "l[3::2] = [11,22]\n", + "l" + ] + }, + { + "cell_type": "code", + "execution_count": 61, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[0, 1, 100, 22, 9]" + ] + }, + "execution_count": 61, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# l[2:5] = 100 error because 100 is not iterable\n", + "l[2:5] = [100] \n", + "l" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "如果赋值的对象是一个切片,那么赋值语句的右侧**必须是个可迭代对象**。即便只有单独一个值,也要把它转换成可迭代的序列。\n", + "\n", + "## 对序列使用`+`和`*`\n", + "`+`和`*`都遵循这样的规律:不修改原有的操作对象,而是构建一个全新的序列。\n", + "\n", + "`*`操作符的一个潜在的缺点是,它会把一个单一的元素复制多次以构建新的列表。这意味着,如果这个元素是**可变的**,就可能导致意想不到的副作用。(萌新时期噩梦!)\n", + "\n", + "以下是一些正确/错误的用法!\n", + "最正确的方式是listcomp!" + ] + }, + { + "cell_type": "code", + "execution_count": 62, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[['_', '_', '_'], ['_', '_', '_'], ['_', '_', '_']]" + ] + }, + "execution_count": 62, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "board = [['_'] * 3 for i in range(3)]\n", + "board" + ] + }, + { + "cell_type": "code", + "execution_count": 63, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[['_', '_', '_'], ['_', '_', 'X'], ['_', '_', '_']]" + ] + }, + "execution_count": 63, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "board[1][2] = 'X'\n", + "board" + ] + }, + { + "cell_type": "code", + "execution_count": 64, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[['_', '_', 'O'], ['_', '_', 'O'], ['_', '_', 'O']]" + ] + }, + "execution_count": 64, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "weird_board = [['_'] * 3] * 3\n", + "weird_board[1][2] = 'O'\n", + "weird_board" + ] + }, + { + "cell_type": "code", + "execution_count": 65, + "metadata": {}, + "outputs": [], + "source": [ + "row=['_'] * 3 \n", + "board = [] \n", + "for i in range(3):\n", + " board.append(row)" + ] + }, + { + "cell_type": "code", + "execution_count": 66, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[['_', '_', '_'], ['_', '_', '_'], ['X', '_', '_']]" + ] + }, + "execution_count": 66, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "board = []\n", + "for i in range(3):\n", + " row = ['_'] * 3 # creating a new list each iteration\n", + " board.append(row)\n", + "board[2][0] = 'X' \n", + "board" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 2.6 序列的增量赋值\n", + "`+=`背后的特殊方法是`__iadd__` (in-place addition)。如果一个类没有实现该方法,Python会后退一步调用`__add__`。变量名会不会被关联到新的对象,完全取决于这个类型有没有实现__iadd__这个方法" + ] + }, + { + "cell_type": "code", + "execution_count": 67, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "3122698992640\n", + "3122698992640\n", + "3122693751488\n", + "3122694482304\n" + ] + } + ], + "source": [ + "l = [1,2,3]\n", + "print(id(l))\n", + "l *= 2\n", + "print(id(l))\n", + "t = (1,2,3)\n", + "print(id(t))\n", + "t *= 2\n", + "print(id(t))" + ] + }, + { + "cell_type": "code", + "execution_count": 68, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "3122693751488\n" + ] + }, + { + "ename": "TypeError", + "evalue": "'tuple' object does not support item assignment", + "output_type": "error", + "traceback": [ + "\u001b[1;31m---------------------------------------------------------------------------\u001b[0m", + "\u001b[1;31mTypeError\u001b[0m Traceback (most recent call last)", + "Cell \u001b[1;32mIn[68], line 4\u001b[0m\n\u001b[0;32m 2\u001b[0m t \u001b[38;5;241m=\u001b[39m (\u001b[38;5;241m1\u001b[39m,\u001b[38;5;241m2\u001b[39m,[\u001b[38;5;241m30\u001b[39m,\u001b[38;5;241m40\u001b[39m])\n\u001b[0;32m 3\u001b[0m \u001b[38;5;28mprint\u001b[39m(\u001b[38;5;28mid\u001b[39m(t))\n\u001b[1;32m----> 4\u001b[0m \u001b[43mt\u001b[49m\u001b[43m[\u001b[49m\u001b[38;5;241;43m2\u001b[39;49m\u001b[43m]\u001b[49m \u001b[38;5;241m+\u001b[39m\u001b[38;5;241m=\u001b[39m [\u001b[38;5;241m50\u001b[39m,\u001b[38;5;241m60\u001b[39m]\n", + "\u001b[1;31mTypeError\u001b[0m: 'tuple' object does not support item assignment" + ] + } + ], + "source": [ + "# 一个+=的谜题\n", + "t = (1,2,[30,40])\n", + "print(id(t))\n", + "t[2] += [50,60]" + ] + }, + { + "cell_type": "code", + "execution_count": 69, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "(1, 2, [30, 40, 50, 60])\n", + "3122693751488\n" + ] + } + ], + "source": [ + "print(t)\n", + "print(id(t))" + ] + }, + { + "cell_type": "code", + "execution_count": 70, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + " 1 0 LOAD_NAME 0 (s)\n", + " 2 LOAD_NAME 1 (a)\n", + " 4 DUP_TOP_TWO\n", + " 6 BINARY_SUBSCR\n", + " 8 LOAD_NAME 2 (b)\n", + " 10 INPLACE_ADD\n", + " 12 ROT_THREE\n", + " 14 STORE_SUBSCR\n", + " 16 LOAD_CONST 0 (None)\n", + " 18 RETURN_VALUE\n" + ] + } + ], + "source": [ + "import dis # 可以查看python字节码\n", + "dis.dis(\"s[a] += b\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 2.7 list.sort方法和内置函数sorted\n", + "\n", + "与list.sort 相反的是内置函数sorted,它会新建一个列表作为返回值。这个方法可以接受任何形式的可迭代对象作为参数,甚至包括不可变序列或生成器。 but it always returns a list.\n", + "...\n", + "\n", + "## 2.8 用bisect来管理已排序的序列\n", + "`bisect` 模块包含两个主要函数,`bisect`和`insort`,两个函数都利用二分查找算法来在有序序列中查找或插入元素。\n" + ] + }, + { + "cell_type": "code", + "execution_count": 71, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "DEMO: bisect_right\n", + "haystack -> 1 4 5 6 8 12 15 20 21 23 23 26 29 30\n", + "31 @ 14 | | | | | | | | | | | | | |31\n", + "30 @ 14 | | | | | | | | | | | | | |30\n", + "29 @ 13 | | | | | | | | | | | | |29\n", + "23 @ 11 | | | | | | | | | | |23\n", + "22 @ 9 | | | | | | | | |22\n", + "10 @ 5 | | | | |10\n", + " 8 @ 5 | | | | |8 \n", + " 5 @ 3 | | |5 \n", + " 2 @ 1 |2 \n", + " 1 @ 1 |1 \n", + " 0 @ 0 0 \n" + ] + } + ], + "source": [ + "# bisect_demo.py\n", + "# BEGIN BISECT_DEMO\n", + "import bisect\n", + "import sys\n", + "\n", + "HAYSTACK = [1, 4, 5, 6, 8, 12, 15, 20, 21, 23, 23, 26, 29, 30]\n", + "NEEDLES = [0, 1, 2, 5, 8, 10, 22, 23, 29, 30, 31]\n", + "\n", + "ROW_FMT = '{0:2d} @ {1:2d} {2}{0:<2d}'\n", + "\n", + "def demo(bisect_fn):\n", + " for needle in reversed(NEEDLES):\n", + " position = bisect_fn(HAYSTACK, needle) # <1>\n", + " offset = position * ' |' # <2>\n", + " print(ROW_FMT.format(needle, position, offset)) # <3>\n", + "\n", + "if __name__ == '__main__':\n", + "\n", + " if sys.argv[-1] == 'left': # <4>\n", + " bisect_fn = bisect.bisect_left\n", + " else:\n", + " bisect_fn = bisect.bisect\n", + "\n", + " print('DEMO:', bisect_fn.__name__) # <5>\n", + " print('haystack ->', ' '.join('%2d' % n for n in HAYSTACK))\n", + " demo(bisect_fn)\n", + "\n", + "# END BISECT_DEMO\n" + ] + }, + { + "cell_type": "code", + "execution_count": 72, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "['F', 'A', 'C', 'C', 'B', 'A', 'A']\n", + "['F', 'A', 'C', 'D', 'B', 'B', 'A']\n" + ] + } + ], + "source": [ + "def grade_right(score, breakpoint = [60,70,80,90], grades = \"FDCBA\"):\n", + " i = bisect.bisect(breakpoint, score)\n", + " return grades[i]\n", + "\n", + "def grade_left(score, breakpoint = [60,70,80,90], grades = \"FDCBA\"):\n", + " i = bisect.bisect_left(breakpoint, score)\n", + " return grades[i]\n", + "\n", + "print([grade_right(score) for score in [33,99,77,70,89,90,100]])\n", + "print([grade_left(score) for score in [33,99,77,70,89,90,100]])" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "`bisect` can replace `index` in very long sequences to improve efficiency.\n", + "\n", + "### 2.8.2 用`bisect.insort`插入新元素\n", + "`bisect.insort` 会找到插入元素的位置并保持序列排序。\n", + "Here is an example:" + ] + }, + { + "cell_type": "code", + "execution_count": 73, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "10 -> [10]\n", + " 0 -> [0, 10]\n", + " 6 -> [0, 6, 10]\n", + " 8 -> [0, 6, 8, 10]\n", + " 7 -> [0, 6, 7, 8, 10]\n", + " 2 -> [0, 2, 6, 7, 8, 10]\n", + "10 -> [0, 2, 6, 7, 8, 10, 10]\n" + ] + } + ], + "source": [ + "# bisect_insort.py\n", + "\n", + "import bisect\n", + "import random\n", + "\n", + "SIZE = 7\n", + "\n", + "random.seed(1729)\n", + "\n", + "my_list = []\n", + "for i in range(SIZE):\n", + " new_item = random.randrange(SIZE*2)\n", + " bisect.insort(my_list, new_item)\n", + " print('%2d ->' % new_item, my_list)\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "`insort` 跟 `bisect` 一样,有`lo`和`hi`两个可选参数用来控制查找的范围。它也有个变体叫`insort_left`,这个变体在背后用的是`bisect_left`。" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# 2.9 当列表不是首选时\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "比如,要存放1000 万个浮点数的话,数组(array)的效率要高得多,因为数组在背后存的并不是float对象,而是数字的机器翻译,也就是字节表述\n", + "`array.tofile()` and `array.fromfile()` can be used to save and load large arrays. And they are really fast!\n", + "(similarly, `pickle.dump()` and `pickle.load()` can be used to save and load any object)" + ] + }, + { + "cell_type": "code", + "execution_count": 74, + "metadata": {}, + "outputs": [], + "source": [ + "from array import array\n", + "a = array('b',[2,9,1,5,7])\n", + "a = array(a.typecode, sorted(a)) # cannot use list.sort() after Python 3.4" + ] + }, + { + "cell_type": "code", + "execution_count": 75, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "array('b', [1, 2, 5, 7, 9])\n" + ] + } + ], + "source": [ + "print(a) " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### 2.9.2 memory view(?)\n", + "memoryview 是一个内置类,它能让用户在不复制内容的情况下操作同一个数组的不同切片。" + ] + }, + { + "cell_type": "code", + "execution_count": 76, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "5" + ] + }, + "execution_count": 76, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "numbers = array('h',[-2, -1, 0, 1, 2]) # short type\n", + "memv = memoryview(numbers)\n", + "len(memv)" + ] + }, + { + "cell_type": "code", + "execution_count": 77, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "-2\n" + ] + } + ], + "source": [ + "print(memv[0])" + ] + }, + { + "cell_type": "code", + "execution_count": 78, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[254, 255, 255, 255, 0, 0, 1, 0, 2, 0]" + ] + }, + "execution_count": 78, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "memv_oct = memv.cast('B') # cast to unsigned char\n", + "memv_oct.tolist()" + ] + }, + { + "cell_type": "code", + "execution_count": 79, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array('h', [-2, -1, 1024, 1, 2])" + ] + }, + "execution_count": 79, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "memv_oct[5] = 4\n", + "numbers " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "因为我们把占2个字节的整数的高位字节改成了4,所以这个有符号整数的值就变成了1024\n", + "\n", + "跳过Numpy和Scipy的部分(嘻嘻)\n", + "\n", + "### 双向队列和其他形式的队列\n", + "\n", + "We can use `append` and `pop` to add and remove elements from both ends of a list. But it is not efficient these operations require moving all elements inside the list.\n", + "\n", + "`collections.deque` is a thread-safe double-ended queue designed for fast inserting and removing from both ends. It is also the best choice if you need to keep a list of \"last seen items\" or something like that (with `deque`).\n" + ] + }, + { + "cell_type": "code", + "execution_count": 80, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "deque([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], maxlen=10)" + ] + }, + "execution_count": 80, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "from collections import deque\n", + "dq = deque(range(10), maxlen = 10)\n", + "dq" + ] + }, + { + "cell_type": "code", + "execution_count": 81, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "deque([7, 8, 9, 0, 1, 2, 3, 4, 5, 6], maxlen=10)" + ] + }, + "execution_count": 81, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "dq.rotate(3)\n", + "dq" + ] + }, + { + "cell_type": "code", + "execution_count": 82, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "deque([1, 2, 3, 4, 5, 6, 7, 8, 9, 0], maxlen=10)" + ] + }, + "execution_count": 82, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "dq.rotate(-4)\n", + "dq" + ] + }, + { + "cell_type": "code", + "execution_count": 83, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "deque([-1, 1, 2, 3, 4, 5, 6, 7, 8, 9], maxlen=10)" + ] + }, + "execution_count": 83, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "dq.appendleft(-1)\n", + "dq" + ] + }, + { + "cell_type": "code", + "execution_count": 84, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "deque([3, 4, 5, 6, 7, 8, 9, 11, 22, 33], maxlen=10)" + ] + }, + "execution_count": 84, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "dq.extend([11,22,33])\n", + "dq" + ] + }, + { + "cell_type": "code", + "execution_count": 85, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "deque([40, 30, 20, 10, 3, 4, 5, 6, 7, 8], maxlen=10)" + ] + }, + "execution_count": 85, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "dq.extendleft([10,20,30,40])\n", + "dq" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "为了实现`popleft`&`rotate`等方法,双向队列也付出了一些代价,从队列中间删除元素的操作会慢一些,因为它只对在头尾的操作进行了优化。\n", + "\n", + "`append`和`popleft`都是原子操作,也就说是`deque`可以在多线程程序中安全地当作先进先出的栈使用,而使用者不需要担心资源锁的问题。\n", + "\n", + "了解:`queue` `multiprocessing` `asyncio` `heapq`也有自己的队列实现(之后可以了解一下用法)\n", + "\n", + "章末杂谈有个有关`key`的例子很有趣" + ] + }, + { + "cell_type": "code", + "execution_count": 86, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[0, '1', 5, 6, '9', 14, 19, '23', 28, '28']" + ] + }, + "execution_count": 86, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "l = [28, 14, '28', 5, '9', '1', 0, 6, '23', 19] \n", + "sorted(l, key=int)" + ] + }, + { + "cell_type": "code", + "execution_count": 87, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[0, '1', 14, 19, '23', 28, '28', 5, 6, '9']" + ] + }, + "execution_count": 87, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "l = [28, 14, '28', 5, '9', '1', 0, 6, '23', 19] \n", + "sorted(l, key=str)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "需要决定到底是把字符看作数值,还是把数值看作字符\n", + "\n", + "有关`Timsort`\n", + "- sorted 和 list.sort 背后的排序算法是Timsort,它是一种自适应算法,会根据原始数据的顺序特点交替使用插入排序和归并排序,以达到最佳效率。" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "base", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.10.9" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/03-dict-set/03-notes.ipynb b/03-dict-set/03-notes.ipynb new file mode 100644 index 0000000..d3c03aa --- /dev/null +++ b/03-dict-set/03-notes.ipynb @@ -0,0 +1,462 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Ch3 字典和集合\n", + "本质是散列表\n", + "\n", + "## 3.1 泛映射类型\n", + "\n", + "> collections.abc 模块中有Mapping 和 MutableMapping 这两个抽象基类,它们的作用是为字典和集合dict和其他类似的类型定义形式接口\n", + "\n", + "继承树:\n", + "\n", + "`Container` class\n", + "- `__contains__`\n", + " \n", + "`Iterable` class\n", + "- `__iter__`\n", + "\n", + "`Sized` class\n", + "- `__len__`\n", + "\n", + "`Mapping` class extends `Container`, `Iterable`, `Sized`\n", + "- `__getitem__`\n", + "- `___contains__`\n", + "- `__eq__`\n", + "- `__ne__`\n", + "- `get`\n", + "- `keys`\n", + "- `items`\n", + "- `values`\n", + "\n", + "`MutableMapping` class extends `Mapping`\n", + "- `__setitem__`\n", + "- `__delitem__`\n", + "- `pop`\n", + "- `popitem`\n", + "- `clear`\n", + "- `update`\n", + "- `setdefault`\n", + "\n", + "> 然而,非抽象映射类型一般不会直接继承这些抽象基类,它们会直接对`dict`或是`collections.User.Dict`进行扩展。这些抽象基类的主要作用是作为形式化的文档,它们定义了构建一个映射类型所需要的最基本的接口。\n" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "True" + ] + }, + "execution_count": 1, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "import collections.abc as abc\n", + "\n", + "my_dict = {}\n", + "isinstance(my_dict, abc.Mapping)\n", + "#这里用isinstance而不是type来检查某个参数是否为dict类型,因为这个参数有可能不是dict,而是一个比较另类的映射类型。" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "What are hashable objects?\n", + "- An object is hashable if it has a hash value which never changes during its lifetime (it needs a `__hash__()` method), and can be compared to other objects (it needs an `__eq__()` method). Hashable objects which **compare equal must have the same hash value**.\n", + "\n", + "str, bytes, numeric types are hashable. Tuple is hashable **if all its elements are hashable**.\n", + "\n", + "Normally, all user defined objects are hashable because their hash value is their id(). If an object implements a custom `__eq__()` that takes into account its internal state, it may be hashable only if all its attributes are immutable.\n", + "\n", + "Here are different ways to construct a dictionary:" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "True" + ] + }, + "execution_count": 2, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "a = dict(one = 1, two = 2, three = 3)\n", + "b = {'one':1,'two':2,\"three\":3}\n", + "c = dict(zip(['one', 'two', 'three'], [1, 2, 3]))\n", + "d = dict([('two', 2), ('one', 1), ('three', 3)])\n", + "e = dict({'three': 3, 'one': 1, 'two': 2}) \n", + "a == b == c == d == e" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 3.2 字典推导\n" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "d1: dict_keys([86, 91, 1, 62, 55, 92, 880, 234, 7, 81])\n", + "d2: dict_keys([1, 7, 55, 62, 81, 86, 91, 92, 234, 880])\n", + "d3: dict_keys([880, 55, 86, 91, 62, 81, 234, 92, 7, 1])\n" + ] + } + ], + "source": [ + "# dialcodes.py\n", + "# BEGIN DIALCODES\n", + "# dial codes of the top 10 most populous countries\n", + "DIAL_CODES = [\n", + " (86, 'China'),\n", + " (91, 'India'),\n", + " (1, 'United States'),\n", + " (62, 'Indonesia'),\n", + " (55, 'Brazil'),\n", + " (92, 'Pakistan'),\n", + " (880, 'Bangladesh'),\n", + " (234, 'Nigeria'),\n", + " (7, 'Russia'),\n", + " (81, 'Japan'),\n", + " ]\n", + "\n", + "d1 = dict(DIAL_CODES) # <1>\n", + "print('d1:', d1.keys())\n", + "d2 = dict(sorted(DIAL_CODES)) # <2>\n", + "print('d2:', d2.keys())\n", + "d3 = dict(sorted(DIAL_CODES, key=lambda x:x[1])) # <3>\n", + "print('d3:', d3.keys())\n", + "assert d1 == d2 and d2 == d3 # <4>\n", + "# END DIALCODES\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 3.3 常见的映射方法\n", + "对于`dict` `defaultdict` `OrderedDict`的常见方法举例\n", + "\n", + "> 后面两个数据类型是`dict`的变种,位于`collections`模块内\n", + "\n", + "`update(m, [**kargs])` duck typing, `m` can be a mapping or an iterable of key-value pairs. The method will first check if `m` has a `keys()` method, if not, it will iterate over `m` assuming it is an iterable of key-value pairs.\n", + "\n", + "`d[k]`和`d.get(k)`的区别在于:如果键`k`不在字典中,`d[k]`会报错,而`d.get(k,default)`会返回defualt值。" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# index0.py with slight modification\n", + "\"\"\"Build an index mapping word -> list of occurrences\"\"\"\n", + "\n", + "import sys\n", + "import re\n", + "\n", + "WORD_RE = re.compile(r'\\w+')\n", + "\n", + "index = {}\n", + "with open(sys.argv[1], encoding='utf-8') as fp:\n", + " for line_no, line in enumerate(fp, 1):\n", + " for match in WORD_RE.finditer(line):\n", + " word = match.group()\n", + " column_no = match.start()+1\n", + " location = (line_no, column_no)\n", + " # this is ugly; coded like this to make a point\n", + " occurrences = index.get(word, []) # <1>\n", + " occurrences.append(location) # <2>\n", + " index[word] = occurrences # <3>\n", + "\n", + "# print in alphabetical order\n", + "for word in sorted(index, key=str.upper): # <4> \n", + " print(word, index[word])\n", + " \n", + "# <4> 没有调用str.upper 而是把方法的引用传递给sorted\n", + "# 以便在排序时将单词规范为统一形式" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import sys\n", + "import re\n", + "\n", + "WORD_RE = re.compile(r'\\w+')\n", + "\n", + "index = {}\n", + "with open(sys.argv[1], encoding='utf-8') as fp:\n", + " for line_no, line in enumerate(fp, 1):\n", + " for match in WORD_RE.finditer(line):\n", + " word = match.group()\n", + " column_no = match.start()+1\n", + " location = (line_no, column_no)\n", + " index.setdefault(word, []).append(location) # <1> only one line, one query on key\n", + "\n", + "# print in alphabetical order\n", + "for word in sorted(index, key=str.upper):\n", + " print(word, index[word])" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 3.4 映射的弹性键查询(处理找不到键的情况)\n", + "- 通过`defaultdict`来实现\n", + "- 自定义`dict`的子类,实现`__missing__`方法\n", + "\n", + "### 3.4.1 `defaultdict`: 处理找不到键的情况\n", + "\n", + "> 具体而言,在实例化一个`defaultdict`的时候,需要给构造方法提供一个可调用对象,这个可调用对象会在`__getitem__`碰到找不到的键的时候被调用,让`__getitem__`返回某种默认值。\n", + "\n", + "比如,我们新建了这样一个字典:`dd = defaultdict(list)`,如果键`'new-key'`在`dd`中还不存在的话,表达式`dd['new-key']`会按照以下的步骤来行事。\n", + "\n", + "1. 调用list() 建立一个新列表\n", + "2. 把新列表作为值,`'new-key'`作为键,放到`defaultdict`中\n", + "3. 返回列表的引用(?)\n", + "\n", + "> 而这个用来生成默认值的可调用对象存放在名`default_factory`的实例属性里。" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import sys\n", + "import re\n", + "import collections\n", + "\n", + "WORD_RE = re.compile(r'\\w+')\n", + "\n", + "index = collections.defaultdict(list) # <1> list method as default factory\n", + "with open(sys.argv[1], encoding='utf-8') as fp:\n", + " for line_no, line in enumerate(fp, 1):\n", + " for match in WORD_RE.finditer(line):\n", + " word = match.group()\n", + " column_no = match.start()+1\n", + " location = (line_no, column_no)\n", + " index[word].append(location) # <2> can always success\n", + "\n", + "# print in alphabetical order\n", + "for word in sorted(index, key=str.upper):\n", + " print(word, index[word])" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "> 如果在创建`defaultdict` 的时候没有指定`default_factory`,查询不存在的键会触发`KeyError`。\n", + "\n", + "default_factory 只会在`__getitem__`里被调用,而在其他的方法里不会被调用。比如,当key不存在时,`dd.get(k)`会返回`None`,而不会调用`default_factory`。\n", + "\n", + "这一切的背后其实都靠的是`__missing__`方法。\n", + "\n", + "### 3.4.2 `__missing__`方法\n", + "\n", + "\n", + "## 3.7 不可变的映射类型 `types.MappingProxyType`\n", + "> 如果给这个类一个映射,它会返回一个只读的映射视图。虽然是个只读视图,但是它是动态的。也就是说,如果对原映射做出了改动,我们通过这个视图可以观察到,但是无法通过这个视图对原映射做出改动。" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "mappingproxy({1: 'A'})" + ] + }, + "execution_count": 1, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "from types import MappingProxyType\n", + "d = {1:\"A\"}\n", + "d_proxy = MappingProxyType(d)\n", + "d_proxy" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "'A'" + ] + }, + "execution_count": 2, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "d_proxy[1]" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [ + { + "ename": "TypeError", + "evalue": "'mappingproxy' object does not support item assignment", + "output_type": "error", + "traceback": [ + "\u001b[1;31m---------------------------------------------------------------------------\u001b[0m", + "\u001b[1;31mTypeError\u001b[0m Traceback (most recent call last)", + "Cell \u001b[1;32mIn[3], line 1\u001b[0m\n\u001b[1;32m----> 1\u001b[0m \u001b[43md_proxy\u001b[49m\u001b[43m[\u001b[49m\u001b[38;5;241;43m2\u001b[39;49m\u001b[43m]\u001b[49m \u001b[38;5;241m=\u001b[39m \u001b[38;5;124m'\u001b[39m\u001b[38;5;124mx\u001b[39m\u001b[38;5;124m'\u001b[39m\n", + "\u001b[1;31mTypeError\u001b[0m: 'mappingproxy' object does not support item assignment" + ] + } + ], + "source": [ + "d_proxy[2] = 'x'" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "'B'" + ] + }, + "execution_count": 4, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "d[2] = 'B' \n", + "d_proxy[2]" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 3.8 集合论\n", + "\n", + "### 3.8.1 集合字面量\n", + "\n", + "空集必须要用`set()`来表示,因为`{}`表示的是空字典。当时1330还错过\n", + "\n", + "> Python特性\n", + "> 像{1, 2, 3} 这种字面量句法相比于构造方法(set([1, 2, 3]))要更快且更易读。后者的速度要慢一些,因为Python必须先从set这个名字来查询构造方法,然后新建一个列表,最后再把这个列表传入到构造方法里。但是如果是像{1, 2, 3}这样的字面量,Python会利用一个专门的叫作BUILD_SET的字节码来创建集合。\n", + "\n", + "`frozenset`没有字面量句法,只能通过构造方法来创建。\n", + "\n", + "### 3.8.2 集合推导\n", + "和列表推导式类似,只是用`{}`来代替`[]`。\n", + "\n", + "### 3.8.3 集合的操作\n", + "\n", + "有一些之前不太熟悉的就地修改操作:\n", + "- `s.update(t)` or `s |= t` union 就地修改\n", + "- `s.intersection_update(t)` or `s &= t` intersection 就地修改\n", + "- `s.difference_update(t)` or `s -= t` difference 就地修改\n", + "- `s.symmetric_difference_update(t)` or `s ^= t` symmetric difference 就地修改\n", + "\n", + "这里传入的参数可以是任何可迭代对象,包括集合,列表,生成器等(一个有趣的特性)\n", + "\n", + "返回值为bool的一些运算符:\n", + "- `s.isdisjoint(t)` 如果两个集合的交集为空,返回True\n", + "- `s.issubset(t)` 如果s中的每一个元素都在t中,返回True\n", + "- `s.issuperset(t)` 如果t中的每一个元素都在s中,返回True\n", + "\n", + "其他实用性方法:\n", + "- `s.add(e)` 添加元素\n", + "- `s.clear()` 清空集合\n", + "- `s.remove(e)` 删除元素,如果不存在会报错\n", + "- `s.discard(e)` 删除元素,如果不存在不会报错\n", + "- `s.copy()` 返回一个新的集合(浅拷贝)\n", + "- `s.pop()` 随机删除一个元素并返回\n", + "\n", + "## 3.9 dict和set的背后\n", + "\n", + "### 3.9.1 效率实验\n", + "> 如果在你的程序里有任何的磁盘输入/输出,那么不管查询有多少个元素的字典或集合,所耗费的时间都能忽略不计(前提是字典或者集合不超过内存大小)\n", + "\n", + "### 3.9.2 字典中的散列表\n", + "\n", + "> 散列表其实是一个稀疏数组(总是有空白元素的数组称为稀疏数组),散列表的单元叫作表元(bucket)。\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "base", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.10.9" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +}