【Python AI教程】(十二)异常链与日志:AI健壮性保障 本章讲解 Python 异常链机制、自定义异常层次设计、logging 模块,以及 AI 应用中的优雅降级与错误恢复策略。
1. 异常链:raise ... from 1.1 为什么需要异常链? 在 AI 应用中,错误往往不是孤立存在的。例如:
1 用户 prompt → LLM API 调用 → 网络错误 → 超时
每一层都可能抛出异常,我们需要知道根本原因 是什么,同时保留完整的错误追踪链。
1.2 raise ... from 语法 1 2 3 4 5 6 try : raise ConnectionError("连接被拒绝" ) except ConnectionError as e: raise LLMAPIError(f"LLM 调用失败: {e} " ) from e
e.__cause__:显式指定的因果异常e.__context__:隐式捕获的上下文异常e.__suppress_context__:是否抑制上下文显示1.3 AI 场景下的异常层次设计 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 class LLMAPIError (Exception ): """LLM API 调用基础异常""" pass class RateLimitError (LLMAPIError ): """限流异常""" pass class AuthError (LLMAPIError ): """认证异常""" pass class ModelUnavailableError (LLMAPIError ): """模型不可用""" pass
这样的设计使得我们可以按层次捕获异常 :
1 2 3 4 5 6 7 8 9 10 11 try : response = call_llm(prompt) except RateLimitError: pass except LLMAPIError: pass except Exception: pass
2. 自定义异常层次设计 2.1 原则 至少 3 层 :基础异常 → 类别异常 → 具体异常业务相关命名 :异常名称应该反映业务含义携带上下文信息 :在异常中存储有用的调试信息2.2 完整示例 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 from typing import Optional class AIAgentError (Exception ): """AI Agent 基础异常""" def __init__ (self, message: str , context: Optional [dict ] = None ): super ().__init__(message) self .context = context or {} class LLMError (AIAgentError ): """LLM 相关错误""" pass class ToolError (AIAgentError ): """工具执行错误""" pass class RateLimitError (LLMError ): """限流错误""" def __init__ (self, retry_after: int = 60 ): super ().__init__(f"Rate limited, retry after {retry_after} s" ) self .retry_after = retry_after class ContextLengthError (LLMError ): """上下文超长错误""" pass def call_llm (prompt: str ) -> str : try : raise ConnectionError("Connection refused" ) except ConnectionError as e: raise LLMError(f"Failed to call LLM: {e} " ) from e
3. logging 模块详解 3.1 核心组件 1 2 3 4 5 Logger ← 日志记录器(入口) ↓ Handler ← 处理器(输出目标) ↓ Formatter ← 格式化器(输出格式)
3.2 基础配置 1 2 3 4 5 6 7 8 9 10 11 import logginglogging.basicConfig( level=logging.INFO, format ='%(asctime)s [%(levelname)s] %(name)s: %(message)s' , datefmt='%Y-%m-%d %H:%M:%S' ) logger = logging.getLogger("Agent" ) logger.info("Agent started" )
3.3 多 Handler 配置 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 logger = logging.getLogger("AIAgent" ) logger.setLevel(logging.DEBUG) console_handler = logging.StreamHandler() console_handler.setLevel(logging.INFO) console_handler.setFormatter(logging.Formatter( '%(asctime)s - %(name)s - %(levelname)s - %(message)s' )) file_handler = logging.FileHandler('agent.log' ) file_handler.setLevel(logging.DEBUG) file_handler.setFormatter(logging.Formatter( '%(asctime)s - %(name)s - %(levelname)s - %(funcName)s:%(lineno)d - %(message)s' )) logger.addHandler(console_handler) logger.addHandler(file_handler)
3.4 Logger 层级结构 1 2 3 4 5 parent_logger = logging.getLogger("Agent" ) child_logger = logging.getLogger("Agent.Child" )
4. AI 应用:优雅降级与错误恢复 4.1 降级策略 当主要方案失败时,自动切换到备选方案:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 def call_with_fallback (prompt: str ) -> str : """多模型降级策略""" models = ["gpt-4" , "gpt-3.5" , "claude-3-haiku" ] errors = [] for model in models: try : return call_model(model, prompt) except RateLimitError as e: errors.append((model, f"Rate limited: {e.retry_after} s" )) continue except AuthError as e: raise AIAgentError(f"Auth failed: {e} " ) except Exception as e: errors.append((model, str (e))) continue return f"All models failed: {errors} "
4.2 重试机制 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 import timefrom functools import wrapsdef retry (max_attempts: int = 3 , delay: float = 1.0 ): """指数退避重试装饰器""" def decorator (func ): @wraps(func ) def wrapper (*args, **kwargs ): for attempt in range (max_attempts): try : return func(*args, **kwargs) except RateLimitError as e: if attempt == max_attempts - 1 : raise wait_time = delay * (2 ** attempt) + e.retry_after print (f"Attempt {attempt + 1 } failed, retrying in {wait_time} s..." ) time.sleep(wait_time) return None return wrapper return decorator @retry(max_attempts=3 , delay=1.0 ) def call_llm_with_retry (prompt: str ) -> str : pass
4.3 完整示例 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 import loggingimport tracebackfrom typing import Optional class LLMAPIError (Exception ): pass class RateLimitError (LLMAPIError ): pass def call_llm (prompt ): try : raise ConnectionError("refused" ) from RateLimitError("429" ) except ConnectionError as e: raise LLMAPIError(f"Failed: {e} " ) from e try : call_llm("hi" ) except LLMAPIError as e: print (f"Error: {e} " ) print (f"Caused by: {e.__cause__} " ) logging.basicConfig(level=logging.INFO, format ='%(asctime)s %(message)s' ) logger = logging.getLogger("Agent" ) class Agent : def __init__ (self, name ): self .logger = logging.getLogger(f"Agent.{name} " ) self .name = name def execute (self, task ): self .logger.info(f"Starting: {task} " ) return f"Done: {task} " def call_with_fallback (prompt ): errors = [] for model in ["gpt-4" , "gpt-3.5" ]: try : if model == "gpt-4" : raise RateLimitError("429" ) return f"Success with {model} " except Exception as e: errors.append((model, str (e))) return f"All failed: {errors} " print (call_with_fallback("hi" ))
5. 最佳实践 5.1 日志规范 级别 使用场景 DEBUG 调试信息,开发环境使用 INFO 正常流程信息 WARNING 警告但不影响功能 ERROR 错误,需要关注 CRITICAL 严重错误,可能导致程序崩溃
5.2 异常处理规范 不要捕获所有异常 :except Exception 会隐藏真正的 bug记录异常时包含上下文 :logger.error(f"Failed: {e}", exc_info=True)异常应该是有意义的 :不要用异常做流程控制清理资源使用 finally :1 2 3 4 5 try : response = call_llm(prompt) finally : cleanup()
6. 总结 本章我们学习了:
异常链 :raise ... from 保留完整的错误追踪链异常层次设计 :构建业务相关的异常体系logging 模块 :Logger/Handler/Formatter 三层架构优雅降级 :多模型、多策略的容错机制健壮性是 AI 应用的生命线。一个优秀的 AI 系统,不仅要能正常运行,更要能在各种异常情况下保持服务可用。
下节预告:【Python AI 教程】(十三)缓存艺术:lru_cache/ttl_cache/自定义 — 深入理解 Python 缓存机制,构建高效的 AI 响应缓存系统。
📚 Python AI教程 系列导航 本文是《Python AI教程》系列第 12/14 篇。
📖 全部 14 篇目录(点击展开) (一)闭包与装饰器 (二)上下文管理器 (三)生成器与迭代器 (四)类型提示 (五)Dataclass 与 attrs (六)async/await (七)Threading 与 Multiprocessing (八)函数式编程 (九)描述符协议 (十)元类 (十一)Protocol与结构化类型 (十二)异常链与日志 ← 当前 (十三)缓存艺术 (十四)组合模式实战