Python多线程方案

文章目录

简介
对比
基准
_thread
Thread类
- Lock
- Queue
multiprocessing.dummy
线程池（推荐）
- 进度条
参考文献

简介

多进程 Process：multiprocessing
- 优点：使用多核 CPU 并行运算
- 缺点：占用资源最多、可启动数目比线程少
- 适用场景：CPU 密集型
多线程 Thread：threading
- 优点：相比进程，更轻量级、占用资源少
- 缺点：
  - 相比进程：多线程并发执行时只能同时使用一个 CPU，不能利用多 CPU（因为 GIL，但因为有 IO 存在，多线程依然可以加速运行）
  - 相比协程：启动数目有限，占用内存资源，有线程切换开销
- 适用场景：IO 密集型、同时运行任务数不多
多协程 Coroutine：asyncio
- 优点：内存开销最小、启动协程数量多
- 缺点：支持的库少、实现复杂
- 适用场景：IO 密集型、需要超多任务运行

IO 指输入输出，有文件 IO 和网络 IO，如文件读写、数据库读写、网络请求（爬虫）

好用的多线程目标：

速度快
有返回值
数据同步

对比

方案	优点	缺点	耗时/s
基准			33.05
_thread	1. 后台运行 2. 适合 GUI	1. 需要程序一直运行 2. 难以获取返回值	142.75
Thread类		1. 获取返回值有点麻烦 2. 数据同步需要用到 Lock 或 Queue	29.22
multiprocessing.dummy	1. 启动方便 2. 有返回值 3. 数据同步	需先收集参数，编写逻辑有点不同	28.81
线程池	1. 启动方便 2. 有返回值 3. 数据同步	需先收集参数，编写逻辑有点不同	30.09

基准

以简单的文件读写为例，模拟 IO 操作

def benchmark(n):
    """多线程基准函数"""
    i = 0
    with open('{}.txt'.format(n), 'w') as f:
        for i in range(n * 1000000):
            f.write(str(i) + '\n')
    return i


if __name__ == '__main__':
    from timeit import timeit


    def f():
        for n in range(10):
            print(benchmark(n))


    print(timeit(f, number=1))

_thread

import _thread

from tool import benchmark


def f():
    for n in range(10):
        print(_thread.start_new_thread(benchmark, (n,)))


if __name__ == '__main__':
    f()
    while True:
        pass

缺点：

需要程序一直运行
难以获取返回值

Thread类

import threading

from tool import benchmark


class MyThread(threading.Thread):
    def run(self):
        if self._target is not None:
            self._return = self._target(*self._args, **self._kwargs)

    def join(self):
        super().join()
        return self._return


def f():
    threads = []
    for n in range(10):
        threads.append(MyThread(target=benchmark, args=(n,)))
    for thread in threads:
        thread.start()
    for thread in threads:
        print(thread.join())


if __name__ == '__main__':
    from timeit import timeit

    print(timeit(f, number=1))

缺点：

获取返回值有点麻烦
数据同步需要用到 Lock 或 Queue

Lock

import time
import threading
from threading import Thread, Lock

lock = Lock()


class Account:
    def __init__(self, balance):
        self.balance = balance


def draw(account, amount):
    with lock:
        if account.balance >= amount:
            time.sleep(0.1)
            print(threading.current_thread().name, '取钱成功')
            account.balance -= amount
            print(threading.current_thread().name, '余额', account.balance)
        else:
            print(threading.current_thread().name, '取钱失败，余额不足')


if __name__ == '__main__':
    account = Account(1000)
    ta = Thread(target=draw, args=(account, 800), name='ta')
    tb = Thread(target=draw, args=(account, 800), name='tb')
    ta.start()
    tb.start()

Queue

import threading
from queue import Queue

from tool import benchmark


def f(queue):
    n = queue.get()
    print(benchmark(n))


if __name__ == '__main__':
    queue = Queue()
    for n in range(10):
        queue.put(n)

    for n in range(10):
        thread = threading.Thread(target=f, args=(queue,))
        thread.start()

这种写法数据不同步

耗时：26.44

multiprocessing.dummy

from multiprocessing.dummy import Pool

from tool import benchmark


def f():
    n_list = [n for n in range(10)]
    pool = Pool(processes=8)
    results = pool.map(benchmark, n_list)
    pool.close()
    pool.join()
    print(results)


if __name__ == '__main__':
    from timeit import timeit

    print(timeit(f, number=1))

线程池（推荐）

线程池

from concurrent.futures import ThreadPoolExecutor

from tool import benchmark


def f():
    n_list = [n for n in range(10)]
    with ThreadPoolExecutor() as executor:
        results = list(executor.map(benchmark, n_list))
        print(results)


if __name__ == '__main__':
    from timeit import timeit

    print(timeit(f, number=1))

要用多个参数时，可用 lambda 函数进行封装，如

import time

from concurrent.futures import ThreadPoolExecutor


def f(x=1, y=2):
    time.sleep(1)
    return x * y


x_list = [1, 2, 3]
y_list = [4, 5, 6]

with ThreadPoolExecutor() as executor:
    results = list(executor.map(f, x_list, y_list))
    print(results)  # [4, 10, 18]
    results = list(executor.map(lambda y: f(y=y), y_list))
    print(results)  # [4, 5, 6]

进度条

from concurrent.futures import ThreadPoolExecutor

from tool import benchmark


def f():
    n_list = [n for n in range(10)]
    with ThreadPoolExecutor() as executor:
        results = list(executor.map(benchmark, n_list))
        print(results)


if __name__ == '__main__':
    from timeit import timeit

    print(timeit(f, number=1))

参考文献

相关阅读:
【etcd】go etcd实战二：分布式锁
day01
索引优化分析_预热_JOIN
VMware 与 SmartX 分布式存储缓存机制浅析与性能对比
数字图像处理实验记录三（双线性插值和最邻近插值）
【Web】https 与 http 的区别
双非温州大学新增电子信息专硕，考408！
EVPN基本原理
系统管理员道德规范
文件批量下载

原文地址：https://blog.csdn.net/lly1122334/article/details/127011043