Skip to content

Python 语言技巧

1. 列表推导和生成器表达式

1.1 从字典中提取子集

你想构造一个字典,它是另外一个字典的子集,最简单的方式是使用字典推导:[1]

python
prices = {
    'ACME': 45.23,
    'AAPL': 612.78,
    'IBM': 205.55,
    'HPQ': 37.20,
    'FB': 10.75
}

# Make a dictionary of all prices over 200
p1 = {key: value for key, value in prices.items() if value > 200}

# Make a dictionary of tech stocks
tech_names = {'AAPL', 'IBM', 'HPQ', 'MSFT'}
p2 = {key: value for key, value in prices.items() if key in tech_names}

大多数情况下字典推导能做到的,通过创建一个元组序列然后把它传给 dict() 函数也能实现:

python
p1 = dict((key, value) for key, value in prices.items() if value > 200)

但是,字典推导方式表意更清晰,并且实际上也会运行的更快些(在这个例子中,实际测试几乎比 dict() 函数方式快整整一倍)

有时候完成同一件事会有多种方式。比如,第二个例子程序也可以像这样重写:

python
# Make a dictionary of tech stocks
tech_names = { 'AAPL', 'IBM', 'HPQ', 'MSFT' }
p2 = { key: prices[key] for key in prices.keys() & tech_names }

但是,运行时间测试结果显示这种方案大概比第一种方案慢 1.6 倍。

1.2 列表推导的过滤和转换

列表推导可以同时进行过滤和转换:

python
# 获取所有偶数的平方
squares = [x ** 2 for x in range(10) if x % 2 == 0]
print(squares)  # [0, 4, 16, 36, 64]

# 多重循环
matrix = [[i * j for j in range(1, 4)] for i in range(1, 4)]
print(matrix)  # [[1, 2, 3], [2, 4, 6], [3, 6, 9]]

# 展平嵌套列表
nested = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
flat = [item for sublist in nested for item in sublist]
print(flat)  # [1, 2, 3, 4, 5, 6, 7, 8, 9]

1.3 生成器表达式节省内存

对于大数据集,生成器表达式比列表推导更节省内存:

python
# 列表推导:创建完整列表,占用大量内存
squares_list = [x ** 2 for x in range(1000000)]

# 生成器表达式:按需生成,节省内存
squares_gen = (x ** 2 for x in range(1000000))

# 可以直接用于需要迭代器的场景
sum_of_squares = sum(x ** 2 for x in range(1000000))

2. 解包操作

2.1 星号表达式解包

使用星号表达式可以轻松处理任意长度的序列:

python
# 基本解包
first, *middle, last = [1, 2, 3, 4, 5]
print(first)   # 1
print(middle)  # [2, 3, 4]
print(last)    # 5

# 在函数参数中使用
def func(a, b, *args, **kwargs):
    print(f"a={a}, b={b}")
    print(f"args={args}")
    print(f"kwargs={kwargs}")

func(1, 2, 3, 4, x=5, y=6)
# a=1, b=2
# args=(3, 4)
# kwargs={'x': 5, 'y': 6}

# 忽略不需要的值
first, *_, last = [1, 2, 3, 4, 5]
print(first, last)  # 1 5

2.2 字典解包

Python 3.5+ 支持字典解包:

python
# 合并字典
dict1 = {'a': 1, 'b': 2}
dict2 = {'c': 3, 'd': 4}
merged = {**dict1, **dict2}
print(merged)  # {'a': 1, 'b': 2, 'c': 3, 'd': 4}

# 更新字典值
defaults = {'host': 'localhost', 'port': 8080}
user_config = {'port': 9000}
config = {**defaults, **user_config}
print(config)  # {'host': 'localhost', 'port': 9000}

3. 默认值和空值处理

3.1 使用 or 提供默认值

python
# 为 None 或空值提供默认值
name = user_input or "Anonymous"

# 但要注意 0、False、空字符串都会被视为假值
count = user_count or 10  # 如果 user_count 是 0,这会有问题

# 更安全的方式
count = user_count if user_count is not None else 10

3.2 使用 defaultdict

python
from collections import defaultdict

# 自动创建默认值
word_count = defaultdict(int)
for word in ["apple", "banana", "apple", "cherry"]:
    word_count[word] += 1  # 不需要检查 key 是否存在

print(dict(word_count))  # {'apple': 2, 'banana': 1, 'cherry': 1}

# 使用 list 作为默认工厂
groups = defaultdict(list)
for name, age in [("Alice", 25), ("Bob", 30), ("Alice", 26)]:
    groups[name].append(age)

print(dict(groups))  # {'Alice': [25, 26], 'Bob': [30]}

3.3 使用 get 方法

python
config = {'timeout': 30}

# 传统方式
if 'timeout' in config:
    timeout = config['timeout']
else:
    timeout = 60

# 使用 get
timeout = config.get('timeout', 60)

# get 还可以避免 KeyError
value = config.get('missing_key')  # 返回 None 而不是抛出异常

4. 函数式编程技巧

4.1 使用 map、filter 和 reduce

python
# map:转换序列中的每个元素
numbers = [1, 2, 3, 4, 5]
squares = list(map(lambda x: x ** 2, numbers))
print(squares)  # [1, 4, 9, 16, 25]

# filter:过滤序列
evens = list(filter(lambda x: x % 2 == 0, numbers))
print(evens)  # [2, 4]

# reduce:累积操作
from functools import reduce
product = reduce(lambda x, y: x * y, numbers)
print(product)  # 120

# 通常列表推导更 Pythonic
squares = [x ** 2 for x in numbers]
evens = [x for x in numbers if x % 2 == 0]

4.2 使用 partial 固定函数参数

python
from functools import partial

def power(base, exponent):
    return base ** exponent

# 创建特定的函数版本
square = partial(power, exponent=2)
cube = partial(power, exponent=3)

print(square(5))  # 25
print(cube(5))    # 125

4.3 使用 lru_cache 缓存结果

python
from functools import lru_cache

@lru_cache(maxsize=128)
def fibonacci(n):
    if n < 2:
        return n
    return fibonacci(n - 1) + fibonacci(n - 2)

# 第一次调用会计算,后续调用直接返回缓存结果
print(fibonacci(100))  # 很快

# 查看缓存信息
print(fibonacci.cache_info())

5. 上下文管理器

5.1 自定义上下文管理器

python
from contextlib import contextmanager

@contextmanager
def timer():
    import time
    start = time.time()
    try:
        yield
    finally:
        end = time.time()
        print(f"Elapsed time: {end - start:.4f}s")

# 使用
with timer():
    # 执行耗时操作
    sum(range(1000000))

5.2 suppress 上下文管理器

python
from contextlib import suppress

# 忽略特定异常
with suppress(FileNotFoundError):
    import os
    os.remove('nonexistent_file.txt')  # 不会抛出异常

# 等价于
try:
    import os
    os.remove('nonexistent_file.txt')
except FileNotFoundError:
    pass

6. 字符串处理技巧

6.1 f-string 格式化

python
name = "Alice"
age = 30

# 基本用法
print(f"Name: {name}, Age: {age}")

# 表达式
print(f"Next year: {age + 1}")

# 格式化
pi = 3.14159
print(f"Pi: {pi:.2f}")  # Pi: 3.14

# 对齐
print(f"{name:>10}")   # 右对齐
print(f"{name:<10}")   # 左对齐
print(f"{name:^10}")   # 居中

# 调试(Python 3.8+)
x = 10
print(f"{x=}")  # x=10

6.2 字符串方法链

python
text = "  hello world  "

# 方法链
result = text.strip().upper().replace("WORLD", "PYTHON")
print(result)  # "HELLO PYTHON"

# 分割和连接
words = "apple,banana,cherry".split(',')
joined = ' | '.join(words)
print(joined)  # "apple | banana | cherry"

7. 迭代器和可迭代对象

7.1 enumerate 带索引遍历

python
fruits = ['apple', 'banana', 'cherry']

# 获取索引和值
for i, fruit in enumerate(fruits):
    print(f"{i}: {fruit}")

# 指定起始索引
for i, fruit in enumerate(fruits, start=1):
    print(f"{i}: {fruit}")

7.2 zip 并行迭代

python
names = ['Alice', 'Bob', 'Charlie']
ages = [25, 30, 35]
cities = ['New York', 'London', 'Tokyo']

# 并行迭代多个序列
for name, age, city in zip(names, ages, cities):
    print(f"{name} is {age} years old and lives in {city}")

# 创建字典
person_dict = dict(zip(names, ages))
print(person_dict)  # {'Alice': 25, 'Bob': 30, 'Charlie': 35}

7.3 itertools 强大工具

python
from itertools import chain, islice, cycle, groupby

# chain:连接多个迭代器
list1 = [1, 2, 3]
list2 = [4, 5, 6]
combined = list(chain(list1, list2))
print(combined)  # [1, 2, 3, 4, 5, 6]

# islice:切片迭代器
data = range(100)
first_10 = list(islice(data, 10))
print(first_10)  # [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

# cycle:无限循环
colors = ['red', 'green', 'blue']
color_cycle = cycle(colors)
print([next(color_cycle) for _ in range(5)])  # ['red', 'green', 'blue', 'red', 'green']

# groupby:分组
data = [('A', 1), ('A', 2), ('B', 1), ('B', 2), ('C', 1)]
for key, group in groupby(data, key=lambda x: x[0]):
    print(f"{key}: {list(group)}")

8. 类和对象技巧

8.1 dataclass 简化类定义

python
from dataclasses import dataclass

@dataclass
class Point:
    x: float
    y: float
    
    def distance(self):
        return (self.x ** 2 + self.y ** 2) ** 0.5

p = Point(3, 4)
print(p)  # Point(x=3, y=4)
print(p.distance())  # 5.0

8.2 property 装饰器

python
class Temperature:
    def __init__(self, celsius):
        self._celsius = celsius
    
    @property
    def celsius(self):
        return self._celsius
    
    @celsius.setter
    def celsius(self, value):
        if value < -273.15:
            raise ValueError("Temperature below absolute zero")
        self._celsius = value
    
    @property
    def fahrenheit(self):
        return self._celsius * 9/5 + 32

temp = Temperature(25)
print(temp.celsius)     # 25
print(temp.fahrenheit)  # 77.0
temp.celsius = 30       # 使用 setter

8.3 __slots__ 节省内存

python
class Point:
    __slots__ = ['x', 'y']  # 只允许这些属性
    
    def __init__(self, x, y):
        self.x = x
        self.y = y

p = Point(1, 2)
# p.z = 3  # 会抛出 AttributeError

9. 异常处理技巧

9.1 else 和 finally 子句

python
try:
    result = 10 / 2
except ZeroDivisionError:
    print("Cannot divide by zero")
else:
    # 只有没有异常时才执行
    print(f"Result: {result}")
finally:
    # 无论是否有异常都会执行
    print("Cleanup")

9.2 自定义异常

python
class ValidationError(Exception):
    """自定义验证错误"""
    pass

def validate_age(age):
    if age < 0:
        raise ValidationError(f"Age cannot be negative: {age}")
    if age > 150:
        raise ValidationError(f"Age too large: {age}")
    return age

try:
    validate_age(-5)
except ValidationError as e:
    print(f"Validation failed: {e}")

10. 性能优化技巧

10.1 使用集合进行成员检查

python
# 慢:O(n)
large_list = list(range(10000))
print(5000 in large_list)

# 快:O(1)
large_set = set(range(10000))
print(5000 in large_set)

10.2 使用生成器而不是列表

python
# 占用大量内存
def get_numbers():
    return [i for i in range(1000000)]

# 节省内存
def get_numbers_gen():
    return (i for i in range(1000000))
    # 或使用 yield
    # for i in range(1000000):
    #     yield i

10.3 使用局部变量

python
# 慢:全局查找
import math

def slow_function():
    result = []
    for i in range(1000):
        result.append(math.sqrt(i))
    return result

# 快:局部变量
def fast_function():
    result = []
    sqrt = math.sqrt  # 局部化
    for i in range(1000):
        result.append(sqrt(i))
    return result

11. 参考资料

  1. Python Cookbook
  2. Python 官方文档
  3. Effective Python

  1. 从字典中提取子集,Python Cookbookhttps://python3-cookbook.readthedocs.io/zh_CN/latest/c01/p17_extract_subset_of_dict.html ↩︎