fix: 增强导出下载诊断能力,缩短超时时间

下载超时从 300s 减至 120s,失败时自动保存截图、打印服务端响应内容、
检查页面错误提示和新标签页,便于定位 download 事件未触发的根因。

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
2026-05-17 18:23:32 +08:00
parent 975f9e5887
commit b402612641
2 changed files with 99 additions and 21 deletions
+10 -7
View File
@@ -4,7 +4,7 @@ This file provides guidance to Claude Code (claude.ai/code) when working with co
## Project Overview ## Project Overview
SalesShow is a monolithic Flask web application for analyzing sales data from Excel files. It supports manual Excel uploads and automated daily downloads from secsion.com via Playwright browser automation. There is no database — all data lives as Excel files on disk in `uploads/`. SaleShow is a monolithic Flask web application for analyzing sales data from Excel files. It supports manual Excel uploads and automated daily downloads from secsion.com via Playwright browser automation. There is no database — all data lives as Excel files on disk in `uploads/`.
## Commands ## Commands
@@ -23,6 +23,7 @@ gunicorn -w 4 -b 0.0.0.0:8000 app:app
# CLI automation — download reports from secsion.com # CLI automation — download reports from secsion.com
python -m automation.secsion --start 2026-04-28 --end 2026-04-28 python -m automation.secsion --start 2026-04-28 --end 2026-04-28
python -m automation.secsion --start 2026-05-15 --end 2026-05-17 --username 15682076681 --password yourpassword
``` ```
No test framework or linter is configured in this project. No test framework or linter is configured in this project.
@@ -31,17 +32,19 @@ No test framework or linter is configured in this project.
**Backend (Flask, single `app.py`):** **Backend (Flask, single `app.py`):**
- Routes handle file upload (`/upload`), file listing (`/files`), data loading/processing (`/load/<filename>`), deletion, and cleanup. - Routes handle file upload (`/upload`), file listing (`/files`), data loading/processing (`/load/<filename>`), deletion, and cleanup.
- `process_sales_data()` (~lines 371-575 in `app.py`) is the core logic. It uses a state-machine approach to handle two Excel formats: "flat tables" (each row has code + product) and "hierarchical tables" (code row is a header, product rows are children). Outputs daily summaries with per-product breakdowns. - `process_sales_data()` (~lines 371-575) is the core logic. It uses a state-machine approach to handle two Excel formats: "flat tables" (each row has code + product) and "hierarchical tables" (code row is a header, product rows are children). Outputs daily summaries with per-product breakdowns.
- `find_header_row()` dynamically detects the header row by scanning first 20 rows for keyword matches. - `find_header_row()` dynamically detects the header row by scanning first 20 rows for keyword matches.
- Auto-download routes use a global `download_status` dict and run Playwright in daemon threads via `threading.Thread`.
**Automation module (`automation/`):** **Automation module (`automation/`):**
- `secsion.py``SecsionDownloader` uses Playwright headless Chromium to log into secsion.com, navigate to reports, set date range via TDesign date picker, optionally inject `shop_id` via route interception, and download exports. - `secsion.py``SecsionDownloader` uses Playwright headless Chromium to log into secsion.com, navigate to reports, set date range via TDesign date picker (requires click → select day → Enter confirm → Escape close sequence), optionally inject `shop_id` via route interception on `**/api/bill/export`, and download exports. Has 3-retry logic with exponential backoff.
- `uploader.py` — copies downloaded files into `uploads/` with timestamp-prefix naming (same convention as manual uploads). - `uploader.py` — copies downloaded files into `uploads/` with `YYYYMMDD_HHMMSS_` prefix naming (same convention as manual uploads).
- `scheduler.py` — APScheduler `BackgroundScheduler` with `CronTrigger` runs daily auto-download (default 01:00). - `scheduler.py` — APScheduler `BackgroundScheduler` with `CronTrigger` runs daily auto-download (default 01:00). Uses `misfire_grace_time=3600`.
**Configuration (`config.py`):** **Configuration (`config.py`):**
- Three-tier priority: Web UI settings (`data/config.json`) > environment variables (`.env` / system env) > defaults. - Three-tier priority: Web UI settings (`data/config.json`) > environment variables (`.env` / system env) > defaults.
- `Config` class provides static methods for reading/writing secsion credentials, shop ID, and scheduler settings. - `Config` class provides static methods for reading/writing secsion credentials, shop ID, and scheduler settings.
- Passwords are masked (`******`) when returned via the API.
**Frontend (vanilla JS/CSS, no build step):** **Frontend (vanilla JS/CSS, no build step):**
- `main.js` — all client-side interactivity: file upload (drag-and-drop), AJAX to API, data rendering (card/table view), client-side filtering, sorting, pagination (50 items/page), export. - `main.js` — all client-side interactivity: file upload (drag-and-drop), AJAX to API, data rendering (card/table view), client-side filtering, sorting, pagination (50 items/page), export.
@@ -52,5 +55,5 @@ No test framework or linter is configured in this project.
- No database — Excel files on disk are the data store. - No database — Excel files on disk are the data store.
- No frontend build step — vanilla JS/CSS served directly via Flask static files. - No frontend build step — vanilla JS/CSS served directly via Flask static files.
- Playwright automation runs in daemon threads with a global `download_status` dict for status tracking. - Playwright automation runs in daemon threads; status tracked via module-level `download_status` dict in `app.py`.
- Passwords are masked (`******`) when returned via the API. - The secsion.com date picker uses TDesign's `needconfirm="true"` mode — simply calling `.fill()` won't work; must click cell then press Enter.
+89 -14
View File
@@ -159,10 +159,11 @@ class SecsionDownloader:
await asyncio.sleep(3) await asyncio.sleep(3)
# 如果配置了 shop_id,拦截导出请求注入 shop_id # 如果配置了 shop_id,拦截导出请求注入 shop_id,并捕获服务端响应
if self.shop_id: import json
import json export_response = {'status': None, 'body': None, 'content_type': None}
if self.shop_id:
async def inject_shop_id(route): async def inject_shop_id(route):
request = route.request request = route.request
body = json.loads(request.post_data) body = json.loads(request.post_data)
@@ -173,18 +174,92 @@ class SecsionDownloader:
await page.route('**/api/bill/export', inject_shop_id) await page.route('**/api/bill/export', inject_shop_id)
logger.info(f"已设置 shop_id 拦截: {self.shop_id}") logger.info(f"已设置 shop_id 拦截: {self.shop_id}")
# 点击导出报表并捕获下载(Docker 中增加超时到300秒 # 捕获导出接口的响应(用于调试
logger.info("点击导出报表...") async def on_response(response):
async with page.expect_download(timeout=300000) as download_info: if '/api/bill/export' in response.url:
await export_btn.click() export_response['status'] = response.status
logger.info("等待文件下载中...") export_response['content_type'] = response.headers.get('content-type', '')
try:
body = await response.text()
export_response['body'] = body[:2000] if body else ''
except Exception:
export_response['body'] = '(binary or empty)'
logger.info(f"导出接口响应: status={response.status}, content-type={export_response['content_type']}, body长度={len(export_response['body'] or '')}")
download = await download_info.value page.on("response", on_response)
filename = download.suggested_filename
save_path = os.path.join(self.download_dir, filename) # 点击导出报表并捕获下载
await download.save_as(save_path) logger.info("点击导出报表...")
logger.info(f"报表已保存至: {save_path}") download_timeout = 120000 # 2 分钟
return save_path
try:
async with page.expect_download(timeout=download_timeout) as download_info:
await export_btn.click()
logger.info("等待文件下载中...")
download = await download_info.value
filename = download.suggested_filename
save_path = os.path.join(self.download_dir, filename)
await download.save_as(save_path)
logger.info(f"报表已保存至: {save_path}")
return save_path
except Exception as download_err:
# 下载事件未触发,进行诊断
logger.warning(f"下载事件捕获失败: {download_err}")
# 保存调试截图
try:
screenshot_path = os.path.join(self.download_dir, f"debug_export_{datetime.now().strftime('%Y%m%d_%H%M%S')}.png")
await page.screenshot(path=screenshot_path, full_page=True)
logger.info(f"调试截图已保存: {screenshot_path}")
except Exception as ss_err:
logger.warning(f"截图保存失败: {ss_err}")
# 打印捕获到的响应信息
if export_response['status']:
logger.info(f"服务端实际响应: status={export_response['status']}, content-type={export_response['content_type']}")
if export_response['body']:
logger.info(f"响应内容(前500字): {export_response['body'][:500]}")
else:
logger.warning("未捕获到 /api/bill/export 响应,可能是请求被拦截或未发出")
# 检查页面是否有错误提示
try:
error_text = await page.evaluate("""() => {
const msgs = document.querySelectorAll('.t-message--error, .t-notification--error, [class*="error"], .el-message--error');
return Array.from(msgs).map(el => el.textContent.trim()).filter(Boolean).join(' | ');
}""")
if error_text:
logger.error(f"页面错误提示: {error_text}")
except Exception:
pass
# 检查是否有新打开的标签页(某些网站通过 window.open 下载)
try:
pages = page.context.pages
if len(pages) > 1:
logger.info(f"检测到 {len(pages)} 个标签页,检查新标签页...")
for p in pages[1:]:
url = p.url
logger.info(f"新标签页 URL: {url}")
if url.startswith('blob:') or 'download' in url.lower() or 'export' in url.lower():
# 尝试从新标签页下载
try:
async with p.expect_download(timeout=30000) as dl_info:
pass
download = await dl_info.value
filename = download.suggested_filename
save_path = os.path.join(self.download_dir, filename)
await download.save_as(save_path)
logger.info(f"从新标签页下载成功: {save_path}")
return save_path
except Exception:
pass
except Exception:
pass
raise
async def _set_date(self, page, input_box, date_str): async def _set_date(self, page, input_box, date_str):
""" """