Python标准库实战：文件操作与正则表达式-CFANZ编程社区

在日常开发中，我发现自己频繁使用Python标准库的几个核心模块。这些工具就像瑞士军刀一样，能解决80%的常见问题。今天就来分享我的实战经验。

一、系统交互双雄：os与sys

os模块是我操作文件系统的首选工具。记得第一次用它批量重命名照片的经历：

import os

# 批量修改文件后缀
def rename_photos(folder):
    for filename in os.listdir(folder):
        if filename.endswith('.jpg'):
            new_name = filename.replace('.jpg', '_processed.jpg')
            os.rename(
                os.path.join(folder, filename),
                os.path.join(folder, new_name)
            )
    print(f"已完成{len(os.listdir(folder))}个文件处理")

# 使用示例
rename_photos('./vacation_photos')

而sys模块在我处理命令行参数时特别有用：

import sys

if len(sys.argv) > 1:
    print(f"接收到参数: {sys.argv[1:]}")
else:
    print("请在命令行后添加参数", file=sys.stderr)

二、数学计算与时间处理

math模块帮我省去了很多底层实现：

import math

# 计算圆的面积和周长
def circle_calc(radius):
    area = math.pi * math.pow(radius, 2)
    circumference = 2 * math.pi * radius
    return round(area, 2), round(circumference, 2)

print(circle_calc(5))  # 输出：(78.54, 31.42)

datetime处理时间戳是我的日常：

from datetime import datetime, timedelta

# 计算到期日
def get_due_date(days):
    now = datetime.now()
    due = now + timedelta(days=days)
    return due.strftime("%Y-%m-%d %H:%M")

print(f"您的任务将在{get_due_date(7)}到期")

三、文件操作三板斧

读取文件的几种方式：

# 小文件直接读取
with open('config.json', 'r', encoding='utf-8') as f:
    content = f.read()

# 大文件逐行读取
with open('server.log', 'r') as log:
    for line in log:
        if 'ERROR' in line:
            print(line.strip())

写入文件的注意事项：

# 安全写入临时文件
import tempfile

with tempfile.NamedTemporaryFile(mode='w', delete=False) as tmp:
    tmp.write("临时内容")
    tmp_path = tmp.name

# 确认后替换原文件
import shutil
shutil.move(tmp_path, 'final_data.txt')

删除文件的最佳实践：

import os
from pathlib import Path

def safe_remove(file_path):
    if Path(file_path).exists():
        os.unlink(file_path)
        print(f"已删除{file_path}")
    else:
        print("文件不存在")

safe_remove('obsolete_data.db')

四、正则表达式实战

re模块帮我解决了无数文本处理难题：

import re

# 提取日志中的IP地址
log_line = "2023-01-01 12:00:00 [ERROR] From 192.168.1.1: Connection timeout"
ip_pattern = r'\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}'

match = re.search(ip_pattern, log_line)
if match:
    print(f"发现异常IP: {match.group()}")

更复杂的邮箱验证：

def validate_email(email):
    pattern = r'^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$'
    return bool(re.fullmatch(pattern, email))

print(validate_email('test@example.com'))  # True
print(validate_email('invalid.email@'))    # False

五、我的经验之谈

文件操作务必使用with语句，它能自动处理异常和关闭
正则表达式先写小样本测试，再应用到大数据
处理路径时推荐使用pathlib，比纯字符串更安全
时间计算一定要考虑时区问题

这些模块组合使用能解决大部分日常需求。上周我就用它们写了个日志分析脚本：

import re
from datetime import datetime

def analyze_log(log_file):
    error_count = 0
    last_error = None
    
    with open(log_file) as f:
        for line in f:
            if '[ERROR]' in line:
                error_count += 1
                timestamp = line[:19]
                last_error = datetime.strptime(timestamp, "%Y-%m-%d %H:%M:%S")
    
    return {
        'total_errors': error_count,
        'last_occurrence': last_error
    }

Python标准库的强大之处在于，不需要安装第三方包就能完成这么多工作。掌握它们，就像拥有了编程的基础工具箱，遇到问题时总能找到合适的工具。