fix: 增强导出下载诊断能力，缩短超时时间

下载超时从 300s 减至 120s，失败时自动保存截图、打印服务端响应内容、检查页面错误提示和新标签页，便于定位 download 事件未触发的根因。 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-17 18:23:32 +08:00
parent 975f9e5887
commit b402612641
2 changed files with 99 additions and 21 deletions
@@ -4,7 +4,7 @@ This file provides guidance to Claude Code (claude.ai/code) when working with co
 ## Project Overview
-SalesShow is a monolithic Flask web application for analyzing sales data from Excel files. It supports manual Excel uploads and automated daily downloads from secsion.com via Playwright browser automation. There is no database — all data lives as Excel files on disk in `uploads/`.
+SaleShow is a monolithic Flask web application for analyzing sales data from Excel files. It supports manual Excel uploads and automated daily downloads from secsion.com via Playwright browser automation. There is no database — all data lives as Excel files on disk in `uploads/`.
 ## Commands
@@ -23,6 +23,7 @@ gunicorn -w 4 -b 0.0.0.0:8000 app:app
 # CLI automation — download reports from secsion.com
 python -m automation.secsion --start 2026-04-28 --end 2026-04-28
 python -m automation.secsion --start 2026-05-15 --end 2026-05-17 --username 15682076681 --password yourpassword
 ```
 No test framework or linter is configured in this project.
@@ -31,17 +32,19 @@ No test framework or linter is configured in this project.
 **Backend (Flask, single `app.py`):**
 - Routes handle file upload (`/upload`), file listing (`/files`), data loading/processing (`/load/<filename>`), deletion, and cleanup.
- `process_sales_data()` (~lines 371-575 in `app.py`) is the core logic. It uses a state-machine approach to handle two Excel formats: "flat tables" (each row has code + product) and "hierarchical tables" (code row is a header, product rows are children). Outputs daily summaries with per-product breakdowns.
+- `process_sales_data()` (~lines 371-575) is the core logic. It uses a state-machine approach to handle two Excel formats: "flat tables" (each row has code + product) and "hierarchical tables" (code row is a header, product rows are children). Outputs daily summaries with per-product breakdowns.
 - `find_header_row()` dynamically detects the header row by scanning first 20 rows for keyword matches.
 - Auto-download routes use a global `download_status` dict and run Playwright in daemon threads via `threading.Thread`.
 **Automation module (`automation/`):**
- `secsion.py` — `SecsionDownloader` uses Playwright headless Chromium to log into secsion.com, navigate to reports, set date range via TDesign date picker, optionally inject `shop_id` via route interception, and download exports.
+- `secsion.py` — `SecsionDownloader` uses Playwright headless Chromium to log into secsion.com, navigate to reports, set date range via TDesign date picker (requires click → select day → Enter confirm → Escape close sequence), optionally inject `shop_id` via route interception on `**/api/bill/export`, and download exports. Has 3-retry logic with exponential backoff.
- `uploader.py` — copies downloaded files into `uploads/` with timestamp-prefix naming (same convention as manual uploads).
+- `uploader.py` — copies downloaded files into `uploads/` with `YYYYMMDD_HHMMSS_` prefix naming (same convention as manual uploads).
- `scheduler.py` — APScheduler `BackgroundScheduler` with `CronTrigger` runs daily auto-download (default 01:00).
+- `scheduler.py` — APScheduler `BackgroundScheduler` with `CronTrigger` runs daily auto-download (default 01:00). Uses `misfire_grace_time=3600`.
 **Configuration (`config.py`):**
 - Three-tier priority: Web UI settings (`data/config.json`) > environment variables (`.env` / system env) > defaults.
 - `Config` class provides static methods for reading/writing secsion credentials, shop ID, and scheduler settings.
 - Passwords are masked (`******`) when returned via the API.
 **Frontend (vanilla JS/CSS, no build step):**
 - `main.js` — all client-side interactivity: file upload (drag-and-drop), AJAX to API, data rendering (card/table view), client-side filtering, sorting, pagination (50 items/page), export.
@@ -52,5 +55,5 @@ No test framework or linter is configured in this project.
 - No database — Excel files on disk are the data store.
 - No frontend build step — vanilla JS/CSS served directly via Flask static files.
- Playwright automation runs in daemon threads with a global `download_status` dict for status tracking.
+- Playwright automation runs in daemon threads; status tracked via module-level `download_status` dict in `app.py`.
- Passwords are masked (`******`) when returned via the API.
+- The secsion.com date picker uses TDesign's `needconfirm="true"` mode — simply calling `.fill()` won't work; must click cell then press Enter.
@@ -159,10 +159,11 @@ class SecsionDownloader:
        await asyncio.sleep(3)
-        # 如果配置了 shop_id，拦截导出请求注入 shop_id
+        # 如果配置了 shop_id，拦截导出请求注入 shop_id，并捕获服务端响应
-        if self.shop_id:
+        import json
-            import json
+        export_response = {'status': None, 'body': None, 'content_type': None}
        if self.shop_id:
            async def inject_shop_id(route):
                request = route.request
                body = json.loads(request.post_data)
@@ -173,18 +174,92 @@ class SecsionDownloader:
            await page.route('**/api/bill/export', inject_shop_id)
            logger.info(f"已设置 shop_id 拦截: {self.shop_id}")
-        # 点击导出报表并捕获下载（Docker 中增加超时到300秒）
+        # 捕获导出接口的响应（用于调试）
-        logger.info("点击导出报表...")
+        async def on_response(response):
-        async with page.expect_download(timeout=300000) as download_info:
+            if '/api/bill/export' in response.url:
-            await export_btn.click()
+                export_response['status'] = response.status
-            logger.info("等待文件下载中...")
+                export_response['content_type'] = response.headers.get('content-type', '')
                try:
                    body = await response.text()
                    export_response['body'] = body[:2000] if body else ''
                except Exception:
                    export_response['body'] = '(binary or empty)'
                logger.info(f"导出接口响应: status={response.status}, content-type={export_response['content_type']}, body长度={len(export_response['body'] or '')}")
-        download = await download_info.value
+        page.on("response", on_response)
-        filename = download.suggested_filename
+
-        save_path = os.path.join(self.download_dir, filename)
+        # 点击导出报表并捕获下载
-        await download.save_as(save_path)
+        logger.info("点击导出报表...")
-        logger.info(f"报表已保存至: {save_path}")
+        download_timeout = 120000  # 2 分钟
-        return save_path
+
        try:
            async with page.expect_download(timeout=download_timeout) as download_info:
                await export_btn.click()
                logger.info("等待文件下载中...")
            download = await download_info.value
            filename = download.suggested_filename
            save_path = os.path.join(self.download_dir, filename)
            await download.save_as(save_path)
            logger.info(f"报表已保存至: {save_path}")
            return save_path
        except Exception as download_err:
            # 下载事件未触发，进行诊断
            logger.warning(f"下载事件捕获失败: {download_err}")
            # 保存调试截图
            try:
                screenshot_path = os.path.join(self.download_dir, f"debug_export_{datetime.now().strftime('%Y%m%d_%H%M%S')}.png")
                await page.screenshot(path=screenshot_path, full_page=True)
                logger.info(f"调试截图已保存: {screenshot_path}")
            except Exception as ss_err:
                logger.warning(f"截图保存失败: {ss_err}")
            # 打印捕获到的响应信息
            if export_response['status']:
                logger.info(f"服务端实际响应: status={export_response['status']}, content-type={export_response['content_type']}")
                if export_response['body']:
                    logger.info(f"响应内容(前500字): {export_response['body'][:500]}")
            else:
                logger.warning("未捕获到 /api/bill/export 响应，可能是请求被拦截或未发出")
            # 检查页面是否有错误提示
            try:
                error_text = await page.evaluate("""() => {
                    const msgs = document.querySelectorAll('.t-message--error, .t-notification--error, [class*="error"], .el-message--error');
                    return Array.from(msgs).map(el => el.textContent.trim()).filter(Boolean).join(' | ');
                }""")
                if error_text:
                    logger.error(f"页面错误提示: {error_text}")
            except Exception:
                pass
            # 检查是否有新打开的标签页（某些网站通过 window.open 下载）
            try:
                pages = page.context.pages
                if len(pages) > 1:
                    logger.info(f"检测到 {len(pages)} 个标签页，检查新标签页...")
                    for p in pages[1:]:
                        url = p.url
                        logger.info(f"新标签页 URL: {url}")
                        if url.startswith('blob:') or 'download' in url.lower() or 'export' in url.lower():
                            # 尝试从新标签页下载
                            try:
                                async with p.expect_download(timeout=30000) as dl_info:
                                    pass
                                download = await dl_info.value
                                filename = download.suggested_filename
                                save_path = os.path.join(self.download_dir, filename)
                                await download.save_as(save_path)
                                logger.info(f"从新标签页下载成功: {save_path}")
                                return save_path
                            except Exception:
                                pass
            except Exception:
                pass
            raise
    async def _set_date(self, page, input_box, date_str):
        """