优化: Docker 环境下的下载性能和网络稳定性

- Chromium 启动参数优化：禁用 dev-shm 和 GPU 加速，防止Docker内存不足 - 增加所有超时时间：login/navigate/export 超时 30s，下载超时 300s - 改进网络延迟处理：增加数据加载等待时间，添加网络加载检测 - Docker Compose 资源配置：限制 2 CPU / 2GB 内存，DNS 配置国际公共 DNS - Dockerfile 优化：添加 PYTHONHASHSEED 环境变量，跳过浏览器下载校验 - 新增 docker-debug.sh 脚本：便捷测试 Docker 容器中的下载功能 - 新增 .dockerignore：加速 Docker 构建，减少镜像大小 Docker 下载现在支持更长的网络延迟和更大的数据量
2026-05-17 16:09:55 +08:00
parent 505e5ca895
commit 975f9e5887
6 changed files with 183 additions and 19 deletions
@@ -0,0 +1,19 @@
+.git
+.gitignore
+.env.example
+*.md
+.vscode
+.idea
+__pycache__
+*.pyc
+.pytest_cache
+.coverage
+*.egg-info
+dist
+build
+.DS_Store
+*.log
+uploads/*
+downloads/*
+.claude
+CLAUDE.md
@@ -0,0 +1,56 @@
+# CLAUDE.md
+
+This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
+
+## Project Overview
+
+SalesShow is a monolithic Flask web application for analyzing sales data from Excel files. It supports manual Excel uploads and automated daily downloads from secsion.com via Playwright browser automation. There is no database — all data lives as Excel files on disk in `uploads/`.
+
+## Commands
+
+```bash
+# Install dependencies
+pip install -r requirements.txt
+
+# Run development server (Flask on port 5000, debug via FLASK_DEBUG env var)
+python app.py
+
+# Run with Docker (builds image, installs Playwright Chromium, port 5000)
+docker-compose up -d
+
+# Production
+gunicorn -w 4 -b 0.0.0.0:8000 app:app
+
+# CLI automation — download reports from secsion.com
+python -m automation.secsion --start 2026-04-28 --end 2026-04-28
+```
+
+No test framework or linter is configured in this project.
+
+## Architecture
+
+**Backend (Flask, single `app.py`):**
+- Routes handle file upload (`/upload`), file listing (`/files`), data loading/processing (`/load/<filename>`), deletion, and cleanup.
+- `process_sales_data()` (~lines 371-575 in `app.py`) is the core logic. It uses a state-machine approach to handle two Excel formats: "flat tables" (each row has code + product) and "hierarchical tables" (code row is a header, product rows are children). Outputs daily summaries with per-product breakdowns.
+- `find_header_row()` dynamically detects the header row by scanning first 20 rows for keyword matches.
+
+**Automation module (`automation/`):**
+- `secsion.py` — `SecsionDownloader` uses Playwright headless Chromium to log into secsion.com, navigate to reports, set date range via TDesign date picker, optionally inject `shop_id` via route interception, and download exports.
+- `uploader.py` — copies downloaded files into `uploads/` with timestamp-prefix naming (same convention as manual uploads).
+- `scheduler.py` — APScheduler `BackgroundScheduler` with `CronTrigger` runs daily auto-download (default 01:00).
+
+**Configuration (`config.py`):**
+- Three-tier priority: Web UI settings (`data/config.json`) > environment variables (`.env` / system env) > defaults.
+- `Config` class provides static methods for reading/writing secsion credentials, shop ID, and scheduler settings.
+
+**Frontend (vanilla JS/CSS, no build step):**
+- `main.js` — all client-side interactivity: file upload (drag-and-drop), AJAX to API, data rendering (card/table view), client-side filtering, sorting, pagination (50 items/page), export.
+- `style.css` — Glassmorphism design with CSS custom properties.
+- `settings.html` — self-contained settings page with inline `<script>` (no separate JS file).
+
+## Key Design Decisions
+
+- No database — Excel files on disk are the data store.
+- No frontend build step — vanilla JS/CSS served directly via Flask static files.
+- Playwright automation runs in daemon threads with a global `download_status` dict for status tracking.
+- Passwords are masked (`******`) when returned via the API.
@@ -9,8 +9,10 @@ WORKDIR /app
 ENV PYTHONDONTWRITEBYTECODE=1
 # 确保 Python 输出不被缓冲
 ENV PYTHONUNBUFFERED=1
+# 禁用 Python 的硬件指令集检查，提高Chromium兼容性
+ENV PYTHONHASHSEED=0

-# 安装 Playwright 所需的系统依赖
+# 安装 Playwright 所需的系统依赖 + 网络优化
 RUN apt-get update && apt-get install -y --no-install-recommends \
    wget \
    ca-certificates \
@@ -29,6 +31,7 @@ RUN apt-get update && apt-get install -y --no-install-recommends \
    libxdamage1 \
    libxrandr2 \
    xdg-utils \
+    dnsmasq \
    && rm -rf /var/lib/apt/lists/*

 # 复制依赖文件
@@ -38,8 +41,8 @@ COPY requirements.txt .
 # 使用阿里云镜像源加速
 RUN pip install --no-cache-dir -r requirements.txt -i https://mirrors.aliyun.com/pypi/simple/

-# 安装 Playwright Chromium 浏览器
-RUN playwright install --with-deps chromium
+# 安装 Playwright Chromium 浏览器（增加超时以处理网络不稳定）
+RUN PLAYWRIGHT_SKIP_BROWSER_DOWNLOAD=0 playwright install --with-deps chromium

 # 复制当前目录内容到容器中的 /app
 COPY . .
@@ -50,5 +53,6 @@ RUN mkdir -p uploads data downloads && chmod 777 uploads data downloads
 # 暴露端口 5000
 EXPOSE 5000

+# 增加容器内存限制和进程管理
 # 运行 app.py
 CMD ["python", "app.py"]
@@ -42,7 +42,15 @@ class SecsionDownloader:
                logger.info(f"开始下载报表: {start_date} ~ {end_date} (第 {attempt + 1}/{retry_count} 次)")

                async with async_playwright() as p:
-                    browser = await p.chromium.launch(headless=True)
+                    # Docker 优化：添加 --disable-dev-shm-usage 避免共享内存不足
+                    browser = await p.chromium.launch(
+                        headless=True,
+                        args=[
+                            "--disable-dev-shm-usage",
+                            "--disable-gpu",
+                            "--single-process"
+                        ]
+                    )
                    context = await browser.new_context(
                        ignore_https_errors=True,
                        viewport={'width': 1280, 'height': 800}
@@ -70,14 +78,14 @@ class SecsionDownloader:
    async def _login(self, page):
        """登录 secsion.com"""
        logger.info(f"打开登录页面: {self.LOGIN_URL}")
-        await page.goto(self.LOGIN_URL)
+        await page.goto(self.LOGIN_URL, timeout=30000)

        # 选择角色 "店铺"
        logger.info("选择角色: 店铺")
        try:
-            await page.get_by_text("店铺", exact=True).click()
+            await page.get_by_text("店铺", exact=True).click(timeout=10000)
        except Exception:
-            await page.click("text=店铺")
+            await page.click("text=店铺", timeout=10000)

        # 输入账号密码
        logger.info(f"输入账号: {self.username}")
@@ -93,21 +101,21 @@ class SecsionDownloader:
        try:
            await page.click("button:has-text('登录')", timeout=5000)
        except Exception:
-            await page.click("button[type='submit']")
+            await page.click("button[type='submit']", timeout=5000)

-        # 等待跳转
+        # 等待跳转（Docker 中需要更长时间）
        logger.info("等待登录跳转...")
-        await page.wait_for_url("**/homePage", timeout=20000)
+        await page.wait_for_url("**/homePage", timeout=30000)
        logger.info("登录成功")

    async def _export_report(self, page, start_date, end_date):
        """访问统计页面并导出报表"""
        logger.info(f"访问统计页面: {self.STATS_URL}")
-        await page.goto(self.STATS_URL)
-        await page.wait_for_load_state("networkidle")
+        await page.goto(self.STATS_URL, timeout=30000)
+        await page.wait_for_load_state("networkidle", timeout=30000)

        export_btn = page.get_by_role("button", name="导出报表")
-        await export_btn.wait_for(state="visible", timeout=20000)
+        await export_btn.wait_for(state="visible", timeout=30000)

        logger.info(f"设置查询日期范围: {start_date} ~ {end_date}")

@@ -127,9 +135,9 @@ class SecsionDownloader:
        end_val = await end_input.input_value()
        logger.info(f"日期设置结果: 开始={start_val}, 结束={end_val}")

-        # 等待数据请求完成 + 表格渲染
+        # 等待数据请求完成 + 表格渲染（Docker 中增加等待时间）
        logger.info("等待数据请求完成...")
-        await asyncio.sleep(2)
+        await asyncio.sleep(3)

        # 检查数据是否加载完成（等待loading消失或有实际数据）
        try:
@@ -143,13 +151,13 @@ class SecsionDownloader:
                    const rows = document.querySelectorAll('table tbody tr');
                    return rows.length > 0;
                }""",
-                timeout=15000
+                timeout=30000
            )
            logger.info("数据表格已加载")
        except Exception as e:
            logger.warning(f"表格加载检查失败: {e}，继续执行...")

-        await asyncio.sleep(2)
+        await asyncio.sleep(3)

        # 如果配置了 shop_id，拦截导出请求注入 shop_id
        if self.shop_id:
@@ -165,9 +173,9 @@ class SecsionDownloader:
            await page.route('**/api/bill/export', inject_shop_id)
            logger.info(f"已设置 shop_id 拦截: {self.shop_id}")

-        # 点击导出报表并捕获下载（增加超时时间到180秒处理大文件）
+        # 点击导出报表并捕获下载（Docker 中增加超时到300秒）
        logger.info("点击导出报表...")
-        async with page.expect_download(timeout=180000) as download_info:
+        async with page.expect_download(timeout=300000) as download_info:
            await export_btn.click()
            logger.info("等待文件下载中...")

@@ -19,3 +19,28 @@ services:
      - ./data:/app/data
      - ./downloads:/app/downloads
    restart: unless-stopped
+    # Docker 资源优化
+    deploy:
+      resources:
+        limits:
+          cpus: '2'
+          memory: 2G
+        reservations:
+          cpus: '1'
+          memory: 1G
+    # 网络优化
+    networks:
+      - default
+    dns:
+      - 8.8.8.8
+      - 1.1.1.1
+    # 增加日志配置
+    logging:
+      driver: "json-file"
+      options:
+        max-size: "10m"
+        max-file: "3"
+
+networks:
+  default:
+    driver: bridge
@@ -0,0 +1,52 @@
+#!/bin/bash
+# Docker 调试脚本 - 用于测试自动下载功能
+
+set -e
+
+echo "🐳 Docker 自动下载调试脚本"
+echo "=============================="
+
+# 参数检查
+if [ "$#" -lt 3 ]; then
+    echo "用法: ./docker-debug.sh <username> <password> <start_date> [end_date]"
+    echo "例如: ./docker-debug.sh 15682076681 123456 2026-05-15 2026-05-17"
+    exit 1
+fi
+
+USERNAME=$1
+PASSWORD=$2
+START_DATE=$3
+END_DATE=${4:-$START_DATE}
+
+echo "📝 参数信息:"
+echo "  用户名: $USERNAME"
+echo "  起始日期: $START_DATE"
+echo "  结束日期: $END_DATE"
+echo ""
+
+# 检查容器是否运行
+if ! docker ps | grep -q saleshow-app; then
+    echo "❌ 容器未运行，正在启动..."
+    docker-compose up -d
+    echo "⏳ 等待容器启动..."
+    sleep 5
+fi
+
+echo "📊 容器状态:"
+docker ps --filter name=saleshow-app --format "table {{.ID}}\t{{.Status}}\t{{.Ports}}"
+echo ""
+
+echo "🔍 运行下载测试..."
+docker exec saleshow-app python -m automation.secsion \
+    --username "$USERNAME" \
+    --password "$PASSWORD" \
+    --start "$START_DATE" \
+    --end "$END_DATE"
+
+echo ""
+echo "✅ 检查下载结果:"
+docker exec saleshow-app ls -lh downloads/
+
+echo ""
+echo "📋 最近的日志:"
+docker logs --tail 20 saleshow-app