TopShelf 将控制台程序部署到Windows服务-CFANZ编程社区

Python 爬虫入门（八）：爬虫工程化及Scrapy简介「详细介绍」

前言
1. Python
2. Scrapy
3. Scrapy 的核心组件

前言

1. Python

1.1 Python 简介

1.2 Python 爬虫的优势

简洁易用：Python 语法简洁，容易上手，非常适合快速开发和原型设计。
丰富的库和框架：Python 拥有如 requests、BeautifulSoup、Scrapy 等众多库和框架，大大简化了爬虫开发的工作。
强大的社区支持：Python 拥有庞大的开发者社区，遇到问题时可以很容易找到解决方案。

1.3 必须掌握的 Python 基础知识

1.3.1 基本语法

数据类型

# 整数
num = 10
# 浮点数
pi = 3.14
# 字符串
greeting = "Hello, World!"
# 列表
fruits = ["apple", "banana", "cherry"]
# 元组
coordinates = (10.0, 20.0)
# 字典
person = {"name": "Alice", "age": 30}
# 集合
unique_numbers = {1, 2, 3, 4}

控制结构

# if 语句
if num > 0:
    print("Positive number")
else:
    print("Non-positive number")

# for 循环
for fruit in fruits:
    print(fruit)

# while 循环
count = 0
while count < 3:
    print(count)
    count += 1

# 异常处理
try:
    result = 10 / 0
except ZeroDivisionError:
    print("Cannot divide by zero")
finally:
    print("Execution complete")

1.3.2. 函数和模块

函数定义和调用

def greet(name):
    return f"Hello, {name}!"

message = greet("Bob")
print(message)

模块和包

# 导入标准库模块
import math
print(math.sqrt(16))

# 使用自定义模块
# my_module.py 文件内容
def add(a, b):
    return a + b

# main.py 文件内容
import my_module
result = my_module.add(5, 3)
print(result)

1.3.3 文件操作

文件读写

# 写入文件
with open("example.txt", "w") as file:
    file.write("Hello, File!")

# 读取文件
with open("example.txt", "r") as file:
    content = file.read()
    print(content)

文件路径操作

from pathlib import Path

# 创建路径对象
path = Path("example.txt")
# 获取文件名
print(path.name)
# 获取文件扩展名
print(path.suffix)

1.3.4 数据处理

字符串操作

text = "  Hello, World!  "
# 去除空白
stripped_text = text.strip()
print(stripped_text)

# 字符串分割
words = stripped_text.split(", ")
print(words)

正则表达式

import re

pattern = r"\d+"  # 匹配数字
text = "The year is 2024"
matches = re.findall(pattern, text)
print(matches)

1.3.5 类和对象

面向对象编程

class Person:
    def __init__(self, name, age):
        self.name = name
        self.age = age

    def greet(self):
        return f"Hello, my name is {self.name} and I am {self.age} years old."

person = Person("Alice", 30)
print(person.greet())

1.3.6 异常处理

错误捕获

try:
    with open("nonexistent_file.txt", "r") as file:
        content = file.read()
except FileNotFoundError:
    print("File not found")
except Exception as e:
    print(f"An error occurred: {e}")