openclaw-home-pc/openclaw/skills/content-collector/SKILL.md
2026-03-24 04:00:48 +08:00

824 lines
28 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
name: content-collector
description: Automatically collect and archive content from shared links in group chats. When a user shares a link (WeChat articles, Feishu docs, web pages, etc.) in any group chat and asks to archive/collect/save it, this skill triggers to fetch the content, create a Feishu document, and update the knowledge base table. Use when: (1) User shares a link and asks to "收录/转存/保存" content, (2) Need to archive web content to Feishu docs, (3) Building a personal knowledge base from shared links, (4) Organizing learning materials from various sources.
---
# Content Collector - 链接内容自动收录技能
## Overview
This skill enables automatic collection and archiving of content from shared links into a structured knowledge base.
**Core Workflow:**
```
Detect Link → Fetch Content → Extract Images → Upload Images to Feishu → Create Feishu Doc → Update Table
```
## When to Use
### 模式1主动触发显式关键词
当用户消息包含以下**触发词**时,立即执行收录:
- "收录" / "转存" / "保存" / "存档" / "存一下" / "归档" / "备份" / "收藏"
- "存到知识库" / "加入知识库" / "转飞书"
**示例:**
- "这个链接收录一下"
- "存到知识库"
- "转存这篇教程"
### 模式2静默收录自动检测
在**群聊场景**中,自动检测以下链接并静默收录:
- 飞书文档/表格/Wikifeishu.cn
- 微信公众号文章mp.weixin.qq.com
- 技术博客/教程站点
- 知识分享类链接
**静默收录条件:**
1. 消息来自群聊(非私聊)
2. 消息包含可识别的知识类链接
3. 用户没有明确拒绝的意图
**两种模式优先级:**
```
检测到主动触发词 → 立即收录(显式模式)
未检测到触发词但检测到链接 → 静默收录(隐式模式)
```
## Supported Link Types
| Type | Example | Fetch Method |
|------|---------|--------------|
| WeChat Article | `https://mp.weixin.qq.com/s/xxx` | kimi_fetch |
| Feishu Doc | `https://xxx.feishu.cn/docx/xxx` | feishu_fetch_doc |
| Feishu Wiki | `https://xxx.feishu.cn/wiki/xxx` | feishu_fetch_doc |
| Web Page | General URLs | kimi_fetch / web_fetch |
## Supported Image Sources
| Source | Example | Priority | Notes |
|--------|---------|----------|-------|
| Markdown image | `![alt](https://xx/image.png)` | High | 直接替换为飞书 image_key |
| HTML `<img src>` | `<img src="/assets/a.png">` | High | 相对路径需转绝对路径 |
| Lazy-load image | `data-src`, `data-original` | Medium | 常见于公众号/博客懒加载 |
| `srcset` candidate | `srcset="a 1x, b 2x"` | Medium | 优先选择清晰度更高的候选图 |
| Feishu file token | `boxcn...` / `img_v3_...` | High | 需要走飞书素材下载后再上传 |
## Global Availability (全局可用配置)
**生效范围:所有用户、所有群聊**
本技能已配置为全局可用,支持以下对象:
| 对象类型 | 支持状态 | 说明 |
|---------|---------|------|
| **所有用户** | ✅ 可用 | 任何用户分享的链接均可被收录 |
| **所有群聊** | ✅ 可用 | 支持技能中心群、养虾群、学习群等所有群组 |
| **私聊消息** | ✅ 可用 | 用户私信分享链接也可触发收录 |
| **多渠道** | ✅ 可用 | 飞书、其他渠道统一支持 |
**权限说明:**
- 任何用户均可触发收录(无需管理员权限)
- 收录的文档统一存储到指定的知识库目录
- 所有用户均可查看已收录的文档
---
## Installation & Permission Check (安装与权限检查)
在正式使用本技能前,系统必须自动或引导用户完成以下权限校验,以确保流程不中断:
### 1. 飞书权限清单
| 权限项 | 验证工具 | 目的 |
|-------|---------|------|
| **OAuth 授权** | `feishu_oauth` | 获取操作飞书文档和表格的用户凭证 |
| **知识库写入权限** | `feishu_create_doc` | 确保能在指定的 Space ID 下创建节点 |
| **多维表格编辑权限** | `feishu_bitable_app_table_record` | 确保能向指定的 app_token 写入记录 |
| **图片上传权限** | `feishu_im_bot_upload` | 允许将本地图片同步至飞书素材库 |
### 2. 预检流程 (Pre-flight Check)
每次“安装”或配置更新后,执行以下检查:
1. **验证 Space ID 可访问性**:尝试在指定目录下获取节点列表。
2. **验证 Table 结构**:检查 `关键词`、`原链接`、`图片数量`、`图片处理状态` 等字段是否存在(后两者可选)。
3. **静默测试**:如果权限不足,立即通过 `feishu_oauth` 弹出授权引导,而非在执行收录时报错。
---
## Configuration
Before using, ensure these are configured in MEMORY.md:
```markdown
## Content Collector Config
- **Knowledge Base Table**: `[Your Bitable App Token]` (Bitable app_token)
- **Table URL**: [Your Bitable Table URL]
- **Default Table ID**: `[Your Table ID]` (will auto-detect if available)
- **Knowledge Base Space ID**: `[Your Space ID]` (所有文档创建在此知识库下)
- **Knowledge Base URL**: [Your Knowledge Base Homepage URL]
- **Content Categories**: 技术教程, 实战案例, 产品文档, 学习笔记
- **Global Access**: 所有用户可用,所有群聊可用
- **Image Fetch Mode**: `all` / `cover_only`(默认 `all`
- **Image Max Count**: `20`(单篇文档最多处理图片数)
- **Image Max Size MB**: `10`(单图超过阈值则跳过)
- **Image Timeout Sec**: `20`(下载超时)
- **Image Allowed Types**: `jpg,png,gif,webp`
- **Image Fallback**: `keep_original_link=true`
```
**Note**:
1. This skill updates ONLY the configured knowledge base table. Do not create or update any other tables.
2. **All created documents must be saved under the designated Knowledge Base** using wiki_node parameter.
3. **Global Access**: 所有用户、所有群聊均可使用本技能,收录的文档对全员可见。
4. 图片抓取默认开启;若用户明确要求“纯文字收录”,可跳过图片处理。
---
## 📚 知识库文档存储规则(必遵守)
所有收录的文档必须按照以下规则分类存储到知识库对应目录:
### 知识库目录结构
请参考各项目或团队定义的知识库标准目录结构进行存储。收录的文档通常存放在“素材”或“归档”类目录下。
### 文档分类映射规则
| 内容分类 | 存储目录 (wiki_node) | 命名前缀 | 示例 |
|----------|---------------------|----------|------|
| 技术教程 | `F9pFw9dxTiXmpsk5bNlco704nag` (内容文档) | 📖 | 📖 [标题] |
| 实战案例 | `F9pFw9dxTiXmpsk5bNlco704nag` (内容文档) | 🛠️ | 🛠️ [标题] |
| 产品文档 | `F9pFw9dxTiXmpsk5bNlco704nag` (内容文档) | 📄 | 📄 [标题] |
| 学习笔记 | `F9pFw9dxTiXmpsk5bNlco704nag` (内容文档) | 💡 | 💡 [标题] |
| 热点资讯 | `F9pFw9dxTiXmpsk5bNlco704nag` (内容文档) | 🔥 | 🔥 [标题] |
| 设计技能 | `F9pFw9dxTiXmpsk5bNlco704nag` (内容文档) | 🎨 | 🎨 [标题] |
| 工具推荐 | `F9pFw9dxTiXmpsk5bNlco704nag` (内容文档) | 🔧 | 🔧 [标题] |
| 训练营 | `F9pFw9dxTiXmpsk5bNlco704nag` (内容文档) | 🎓 | 🎓 [标题] |
### 文档命名规范
```
[Emoji前缀] [原标题] | 收录日期
示例:
📖 OpenClaw保姆级教程 | 2026-03-08
🛠️ 火山方舟自动化报表案例 | 2026-03-08
🔥 GPT-5.4发布解读 | 2026-03-08
```
### 文档模板
```markdown
# [Emoji] [原标题]
> 📌 **元信息**
> - 来源:[原始来源]
> - 原文链接:[原始URL]
> - 收录时间YYYY-MM-DD
> - 内容分类:[技术教程/实战案例/产品文档/学习笔记/热点资讯/设计技能/工具推荐/训练营]
> - 关键词:[关键词1, 关键词2, 关键词3]
---
## 📋 核心要点
[3-5条核心内容摘要]
---
## 📝 正文内容
[完整的转存内容]
---
## 🔗 相关链接
- 原文链接:[原始URL]
- 知识库索引:[素材池文档索引链接]
---
📚 **收录时间**YYYY-MM-DD
🏷️ **分类**[分类名]
🔖 **关键词**[关键词]
```
### 自动更新素材索引
每次收录完成后,必须:
1. **更新多维表格** - 添加新记录到素材池表格
2. **更新素材索引文档** - 在「📚 内容素材池文档索引」中添加条目
3. **更新分类统计** - 更新各分类的文档数量和占比
---
## Workflow
### Step 1: Detect and Parse Link
Extract URL from user message using regex or direct extraction.
### Step 2: Fetch Content (正文 + 原始结构)
Choose appropriate fetch method based on URL pattern:
**For WeChat articles:**
```python
raw = kimi_fetch(url="https://mp.weixin.qq.com/s/xxx")
```
**For Feishu docs:**
```python
raw = feishu_fetch_doc(doc_id="https://xxx.feishu.cn/docx/xxx")
```
**For general web pages:**
```python
raw = kimi_fetch(url="https://example.com/article")
# or
raw = web_fetch(url="https://example.com/article")
```
**Standardized Output (必须统一):**
```python
fetched = {
"title": raw.get("title", ""),
"markdown": raw.get("markdown", raw.get("content", "")),
"raw_html": raw.get("html", ""),
"source_url": original_url
}
```
### Step 3: Analyze and Categorize
**智能分类判断:**
根据内容特征自动判断分类:
| 判断依据 | 分类 |
|----------|------|
| 包含"安装/配置/部署/教程"等词 | 📖 技术教程 |
| 包含"案例/实战/项目/演示"等词 | 🛠️ 实战案例 |
| 包含"安全/公告/版本/功能"等词 | 📄 产品文档 |
| 包含"学习/成长/指南/笔记"等词 | 💡 学习笔记 |
| 包含"发布/新功能/热点"等词 | 🔥 热点资讯 |
| 包含"设计/Prompt/美学"等词 | 🎨 设计技能 |
| 包含"工具/CLI/插件"等词 | 🔧 工具推荐 |
| 包含"训练营/课程/教学"等词 | 🎓 训练营 |
### Step 4: Process Images (图片处理)
在创建飞书文档前,必须执行图片抓取与回填,目标是“最大化保留原文图片、最小化失败影响正文”。
**Image Processing Workflow v2:**
```python
import os
import re
from urllib.parse import urljoin
IMG_MD_RE = re.compile(r'!\[(.*?)\]\(([^)]+)\)')
IMG_HTML_RE = re.compile(r'<img[^>]+(?:src|data-src|data-original)=["\']([^"\']+)["\']', re.I)
IMG_SRCSET_RE = re.compile(r'<img[^>]+srcset=["\']([^"\']+)["\']', re.I)
def normalize_img_ref(ref: str, base_url: str) -> str:
ref = (ref or "").strip()
if not ref:
return ""
if ref.startswith(("http://", "https://")):
return ref
if ref.startswith("//"):
return "https:" + ref
# 飞书 token 或相对路径
if ref.startswith(("img_v3_", "boxcn", "file_", "AAM")):
return ref
return urljoin(base_url, ref)
def pick_srcset_candidate(srcset_value: str) -> str:
# 示例: "a.jpg 1x, b.jpg 2x" 或 "a.jpg 480w, b.jpg 1080w"
parts = [x.strip() for x in (srcset_value or "").split(",") if x.strip()]
if not parts:
return ""
return parts[-1].split(" ")[0].strip()
def extract_image_candidates(markdown_content: str, raw_html: str, source_url: str):
candidates = []
for alt, ref in IMG_MD_RE.findall(markdown_content or ""):
candidates.append({"alt": alt or "image", "ref": normalize_img_ref(ref, source_url), "from_md": True})
for ref in IMG_HTML_RE.findall(raw_html or ""):
normalized = normalize_img_ref(ref, source_url)
if normalized:
candidates.append({"alt": "image", "ref": normalized, "from_md": False})
for srcset_value in IMG_SRCSET_RE.findall(raw_html or ""):
candidate = normalize_img_ref(pick_srcset_candidate(srcset_value), source_url)
if candidate:
candidates.append({"alt": "image", "ref": candidate, "from_md": False})
# 去重,保持首次出现顺序
seen, ordered = set(), []
for item in candidates:
if item["ref"] and item["ref"] not in seen:
seen.add(item["ref"])
ordered.append(item)
return ordered
def fetch_and_upload_images(markdown_content, raw_html, source_url, cfg):
candidates = extract_image_candidates(markdown_content, raw_html, source_url)
max_count = int(cfg.get("image_max_count", 20))
max_bytes = int(cfg.get("image_max_size_mb", 10)) * 1024 * 1024
replace_map = {}
uploaded_extra = []
failed = []
total = 0
success = 0
for item in candidates[:max_count]:
ref = item["ref"]
total += 1
tmp_path = None
try:
# 1) 下载图片到本地临时文件
if ref.startswith(("http://", "https://")):
tmp_path = download_image_to_local(
ref,
timeout=int(cfg.get("image_timeout_sec", 20)),
max_bytes=max_bytes,
allowed_types=cfg.get("image_allowed_types", "jpg,png,gif,webp")
)
else:
# 飞书 file_token / image_key
tmp_path = download_feishu_media_to_local(
ref,
max_bytes=max_bytes,
allowed_types=cfg.get("image_allowed_types", "jpg,png,gif,webp")
)
# 2) 上传到飞书素材
upload_result = feishu_im_bot_upload(action="upload_image", file_path=tmp_path)
image_key = upload_result.get("image_key")
if not image_key:
raise RuntimeError("empty image_key")
success += 1
replace_map[ref] = image_key
if not item["from_md"]:
uploaded_extra.append((item["alt"], image_key))
except Exception as e:
failed.append({"ref": ref, "reason": str(e)[:120]})
finally:
if tmp_path and os.path.exists(tmp_path):
os.remove(tmp_path)
# 3) 回填 markdown 中已有图片
processed = markdown_content
for src, image_key in replace_map.items():
processed = processed.replace(f"({src})", f"({image_key})")
# 4) 将 HTML-only 图片追加到文末,避免丢图
if uploaded_extra:
lines = ["", "---", "", "## 🖼️ 原文配图(自动抓取)", ""]
for alt, image_key in uploaded_extra:
lines.append(f"![{alt}]({image_key})")
processed += "\n".join(lines)
# 5) 失败兜底提示(不阻断收录)
if failed:
processed += (
"\n\n> ⚠️ 部分图片处理失败,已保留正文收录。"
f" 成功 {success}/{total},失败 {len(failed)}。"
)
image_stats = {
"total": total,
"success": success,
"failed": len(failed),
"failed_refs": failed
}
return processed, image_stats
```
**Fallback Strategy:**
- 单张图片失败不影响整篇文档收录
- 原文中存在但未成功托管的图片,保留原链接并记录失败原因
- 文档末尾自动追加“原文配图/失败提示”区块,确保信息不丢失
### Step 5: Create Feishu Document (按知识库规则存储)
Convert processed markdown to Feishu document with proper organization:
```python
# 0. 先执行图片处理
processed_markdown_content, image_stats = fetch_and_upload_images(
markdown_content=fetched["markdown"],
raw_html=fetched["raw_html"],
source_url=fetched["source_url"],
cfg=config
)
# 1. 确定分类和参数
content_category = classify_content(processed_markdown_content) # 📖/🛠️/📄/💡/🔥/🎨/🔧/🎓
emoji_prefix = get_emoji_prefix(content_category) # 根据分类获取emoji
wiki_node = get_wiki_node_by_category(content_category) # 获取存储目录
# 2. 生成文档标题
doc_title = f"{emoji_prefix} {original_title} | {today_date}"
# 3. 生成文档内容(使用标准模板)
doc_content = f"""# {emoji_prefix} {original_title}
> 📌 **元信息**
> - 来源:{source_name}
> - 原文链接:{original_url}
> - 收录时间:{today_date}
> - 内容分类:{content_category}
> - 关键词:{keywords}
> - 图片处理:成功 {image_stats["success"]}/{image_stats["total"]}(失败 {image_stats["failed"]}
---
## 📋 核心要点
{extract_key_points(processed_markdown_content, 5)}
---
## 📝 正文内容
{processed_markdown_content}
---
## 🔗 相关链接
- 原文链接:{original_url}
- 知识库索引:[Your Index Document URL]
---
📅 **收录时间**{today_date}
🏷️ **分类**{content_category}
🔖 **关键词**{keywords}
"""
# 4. 创建文档到知识库对应目录
feishu_create_doc(
title=doc_title,
markdown=doc_content,
wiki_node=wiki_node # 必须指定存储目录
)
```
**存储目录映射:**
| 分类 | wiki_node | 目录名 |
|------|-----------|--------|
| 所有素材 | `F9pFw9dxTiXmpsk5bNlco704nag` | 04-内容素材 |
**IMPORTANT**:
1. All documents MUST be created under the designated Knowledge Base using wiki_node parameter.
2. Documents must follow the naming convention: `[Emoji] [Title] | [Date]`
3. Documents must use the standard template with metadata section.
### Step 6: Update Knowledge Base Table
Add record to the Bitable knowledge base (ONLY update this specific table):
```python
feishu_bitable_app_table_record(
action="create",
app_token="[Your App Token]", # Configured in MEMORY.md
table_id="[Your Table ID]", # Will use correct table ID from the base
fields={
"关键词": keywords,
"内容分类": content_category,
"文档标题": [{"text": original_title, "type": "text"}],
"来源": [{"text": source_name, "type": "text"}],
"核心要点": [{"text": key_points, "type": "text"}],
"飞书文档链接": {"link": new_doc_url, "text": "飞书文档", "type": "url"},
"原链接": {"link": original_url, "text": "原文链接", "type": "url"},
"图片数量": image_stats["total"],
"图片处理状态": f'{image_stats["success"]}/{image_stats["total"]} 成功',
"图片失败数": image_stats["failed"]
}
)
```
**Table Fields:**
| Field | Type | Description |
|-------|------|-------------|
| 关键词 | Text | Search keywords for the content |
| 内容分类 | Single Select | Category: 📖技术教程/🛠️实战案例/📄产品文档/💡学习笔记/🔥热点资讯/🎨设计技能/🔧工具推荐/🎓训练营 |
| 文档标题 | Text | Title of the archived document |
| 来源 | Text | Original source name |
| 核心要点 | Text | Key points summary (3-5 items) |
| 飞书文档链接 | URL | Link to the created Feishu document |
| 原链接 | URL | **Original source URL** - 新增字段,存储采集的原始链接 |
| 图片数量 | Number | 本次检测到的图片总数 |
| 图片处理状态 | Text | 图片托管结果,例如 `8/10 成功` |
| 图片失败数 | Number | 上传失败图片数,用于质量监控 |
**IMPORTANT**: Only update the configured knowledge base table. Never create or modify other tables.
### Step 7: Update Content Index Document
After creating the document and updating the table, MUST update the index document:
```python
# 1. 获取当前索引文档内容
index_doc = feishu_fetch_doc(doc_id="[Your Index Doc ID]")
# 2. 在对应分类表格中添加新行
new_index_entry = f"| {original_title} | {source_name} | [查看]({new_doc_url}) |\n"
# 3. 更新分类统计
update_category_stats(content_category)
# 4. 更新总计数
update_total_count()
```
**或者直接追加到索引文档的末尾:**
```python
feishu_update_doc(
doc_id="[Your Index Doc ID]",
mode="append",
markdown=f"""
| {original_title} | {source_name} | [查看]({new_doc_url}) |
"""
)
```
---
## Content Categorization Guide
| Category | Emoji | Description | Examples |
|----------|-------|-------------|----------|
| **技术教程** | 📖 | Step-by-step technical guides | Installation, configuration, API usage |
| **实战案例** | 🛠️ | Real-world implementation examples | Case studies, project demos |
| **产品文档** | 📄 | Product features, security notices | Release notes, security advisories |
| **学习笔记** | 💡 | Conceptual knowledge, methodologies | Best practices, architecture guides |
| **热点资讯** | 🔥 | Breaking news, releases | GPT-5.4, new features |
| **设计技能** | 🎨 | Design, prompts, aesthetics | AJ's prompts, design guides |
| **工具推荐** | 🔧 | Tools, CLI, plugins | gws, trae, autotools |
| **训练营** | 🎓 | Courses, bootcamps, tutorials | OpenClaw bootcamp |
**分类判断优先级:**
1. 优先根据用户指定分类
2. 其次根据标题关键词
3. 最后根据内容特征自动判断
4. 不确定时标记为"待分类",请用户确认
## Delete Record Process
When user replies "删除" or "删除 [keyword]":
```python
# 1. Search records by keyword
feishu_bitable_app_table_record(
action="list",
app_token="[Your App Token]",
table_id="[Your Table ID]",
filter={
"conjunction": "and",
"conditions": [
{"field_name": "关键词", "operator": "contains", "value": [keyword]}
]
}
)
# 2. Confirm deletion
# If multiple found → list for user to select
# If single found → ask for confirmation
# 3. Execute deletion
feishu_bitable_app_table_record(
action="delete",
app_token="[Your App Token]",
table_id="[Your Table ID]",
record_id="record_id_to_delete"
)
```
## Error Handling
### Common Issues
| Error | Cause | Solution |
|-------|-------|----------|
| Fetch timeout | Network issue or heavy content | Retry with longer timeout, or use alternative fetch method |
| Unauthenticated | OAuth token expired or not authed | Trigger `feishu_oauth` to refresh user credentials |
| Permission denied | No write access to Space/Table | Check if user/bot has 'Editor' role in Feishu |
| Content too long | Exceeds API limits | Truncate or split into multiple documents |
| Table update failed | Wrong app_token or table_id | Verify configuration in MEMORY.md |
| Field Missing | "原链接" field not in table | Add the field to Bitable manually or via API |
| Image download failed | Source anti-hotlinking / timeout | Retry with headers, then keep original link |
| Image too large | Exceeds size limit | Compress or skip and log warning |
| Invalid image type | Unsupported format or broken file | Skip image and continue document creation |
### Recovery Steps
1. If fetch fails → Try alternative method (kimi_fetch → web_fetch)
2. If image fetch/upload fails → Keep original image link and append warning block
3. If Feishu doc creation fails → Check OAuth status
4. If table update fails → Verify table structure and field names
5. Always report partial success (doc created but table not updated)
## Response Template
### 收录成功响应流式Post格式
```json
{
"msg_type": "post",
"content": {
"post": {
"zh_cn": {
"title": "✅ 收录完成",
"content": [
[
{"tag": "text", "text": "📄 "},
{"tag": "text", "text": "{emoji} {原标题} | {日期}", "style": {"bold": true}}
],
[{"tag": "text", "text": ""}],
[
{"tag": "text", "text": "💡 文档亮点:", "style": {"bold": true}}
],
[
{"tag": "text", "text": "• {亮点1}"}
],
[
{"tag": "text", "text": "• {亮点2}"}
],
[
{"tag": "text", "text": "• {亮点3}"}
],
[{"tag": "text", "text": ""}],
[
{"tag": "text", "text": "🔗 "},
{"tag": "a", "text": "查看飞书文档", "href": "{文档URL}"}
]
]
}
}
}
}
```
**简洁输出示例:**
```
✅ 收录完成
📄 📖 OpenClaw配置指南 | 2026-03-08
💡 文档亮点:
• 完整配置示例含9大模块详解
• 多Agent扩展配置方案
• 生产环境安全配置建议
🔗 查看飞书文档 → [点击打开](https://xxx.feishu.cn/docx/xxx)
```
### 静默收录响应流式Post格式
```json
{
"msg_type": "post",
"content": {
"post": {
"zh_cn": {
"title": "✅ 已自动收录",
"content": [
[
{"tag": "text", "text": "📄 "},
{"tag": "text", "text": "{emoji} {原标题}", "style": {"bold": true}}
],
[{"tag": "text", "text": ""}],
[
{"tag": "text", "text": "💡 亮点:{亮点摘要}"}
],
[{"tag": "text", "text": ""}],
[
{"tag": "a", "text": "📎 查看文档", "href": "{文档URL}"}
]
]
}
}
}
}
```
### 批量收录响应流式Post格式
```json
{
"msg_type": "post",
"content": {
"post": {
"zh_cn": {
"title": "✅ 批量收录完成({N}份)",
"content": [
[
{"tag": "text", "text": "📄 {emoji1} {标题1}", "style": {"bold": true}}
],
[
{"tag": "text", "text": " 💡 {亮点1}"}
],
[
{"tag": "a", "text": " 🔗 查看", "href": "{链接1}"}
],
[{"tag": "text", "text": ""}],
[
{"tag": "text", "text": "📄 {emoji2} {标题2}", "style": {"bold": true}}
],
[
{"tag": "text", "text": " 💡 {亮点2}"}
],
[
{"tag": "a", "text": " 🔗 查看", "href": "{链接2}"}
]
]
}
}
}
}
```
**输出原则:**
1. **必须流式Post格式** - 使用 msg_type: post
2. **只包含3个核心要素**
- 文件名称(📄 Emoji + 标题 + 日期)
- 文档亮点(💡 3-5条核心要点
- 飞书链接(🔗 点击查看)
3. **不输出其他信息** - 不显示分类、不显示表格更新、不显示统计
4. **保持简洁** - 每份文档3-5行内容
## Best Practices
1. **Always verify content was fetched correctly** before creating documents
2. **Extract key insights** from the content for the summary
3. **Use appropriate category** based on content nature
4. **Generate relevant keywords** for better searchability
5. **Keep source attribution** clear for copyright respect
6. **Handle partial failures gracefully** - document what succeeded and what failed
7. **Update index document** - Every new document must be added to the index
8. **Follow naming convention** - Use [Emoji] [Title] | [Date] format
9. **Store in correct directory** - Use wiki_node to place in right category
10. **Image-first fallback** - 图片失败不阻断正文入库,优先保证知识沉淀完整性
## 收录完成检查清单 (Checklist)
每次收录必须完成以下所有步骤:
- [ ] **执行权限预检**(验证 OAuth 及 Space/Table 写入权限)
- [ ] 获取并处理原始内容(含图片)
- [ ] 抽取并去重图片引用Markdown + HTML
- [ ] 图片托管到飞书(记录总数、成功数、失败数)
- [ ] 智能分类并确定 Emoji 前缀
- [ ] 提取核心要点3-5条
- [ ] 生成关键词
- [ ] **创建飞书文档**(使用标准模板,指定 wiki_node
- [ ] **更新多维表格**(添加完整记录,包含**原链接/图片统计**字段)
- [ ] **更新文档索引**(在素材索引中添加条目)
- [ ] 发送收录完成通知给用户
**任何一步未完成,视为收录失败!**
## Integration with Memory
After each collection, update MEMORY.md:
```markdown
### YYYY-MM-DD - Content Collection
- **新增收录**: [Title]
- **来源**: [Source]
- **分类**: [Category]
- **知识库状态**: 共[N]条记录
- **索引更新**: ✅ 已更新
```
This skill is part of the core knowledge management system. Execute with care and attention to detail.
---
## 附录:图片抓取能力 v2执行约束
### 必须满足的目标
1. **不丢图**Markdown 图片 + HTML 图片都要尝试收录。
2. **不阻断**:图片失败不能阻断正文文档创建。
3. **可观测**:表格必须记录图片处理统计(总数/成功/失败)。
4. **可回溯**:失败图片需保留原始引用,便于后续补抓。
### 推荐执行策略
1. 先用 `extract_image_candidates` 统一收集并去重。
2. 每篇文档最多处理 `image_max_count` 张,防止超时。
3. 单图限制 `image_max_size_mb`,超过阈值直接跳过并计入失败。
4. 仅允许常见格式jpg/png/gif/webp其余按失败处理。
5. 所有临时文件在 `finally` 中删除,避免磁盘残留。
### 最小可行验收标准MVP
- 输入含 10 张图的公众号文章,最终文档中至少 8 张能正常显示。
- 即使图片全部失败,也必须产出正文文档并回写表格记录。
- 收录响应中必须返回飞书文档链接,且不暴露内部异常堆栈。
---
*图片处理方案 v2.0 - 2026-03-17*