--- name: content-collector description: Automatically collect and archive content from shared links in group chats. When a user shares a link (WeChat articles, Feishu docs, web pages, etc.) in any group chat and asks to archive/collect/save it, this skill triggers to fetch the content, create a Feishu document, and update the knowledge base table. Use when: (1) User shares a link and asks to "收录/转存/保存" content, (2) Need to archive web content to Feishu docs, (3) Building a personal knowledge base from shared links, (4) Organizing learning materials from various sources. --- # Content Collector - 链接内容自动收录技能 ## Overview This skill enables automatic collection and archiving of content from shared links into a structured knowledge base. **Core Workflow:** ``` Detect Link → Fetch Content → Extract Images → Upload Images to Feishu → Create Feishu Doc → Update Table ``` ## When to Use ### 模式1:主动触发(显式关键词) 当用户消息包含以下**触发词**时,立即执行收录: - "收录" / "转存" / "保存" / "存档" / "存一下" / "归档" / "备份" / "收藏" - "存到知识库" / "加入知识库" / "转飞书" **示例:** - "这个链接收录一下" - "存到知识库" - "转存这篇教程" ### 模式2:静默收录(自动检测) 在**群聊场景**中,自动检测以下链接并静默收录: - 飞书文档/表格/Wiki(feishu.cn) - 微信公众号文章(mp.weixin.qq.com) - 技术博客/教程站点 - 知识分享类链接 **静默收录条件:** 1. 消息来自群聊(非私聊) 2. 消息包含可识别的知识类链接 3. 用户没有明确拒绝的意图 **两种模式优先级:** ``` 检测到主动触发词 → 立即收录(显式模式) 未检测到触发词但检测到链接 → 静默收录(隐式模式) ``` ## Supported Link Types | Type | Example | Fetch Method | |------|---------|--------------| | WeChat Article | `https://mp.weixin.qq.com/s/xxx` | kimi_fetch | | Feishu Doc | `https://xxx.feishu.cn/docx/xxx` | feishu_fetch_doc | | Feishu Wiki | `https://xxx.feishu.cn/wiki/xxx` | feishu_fetch_doc | | Web Page | General URLs | kimi_fetch / web_fetch | ## Supported Image Sources | Source | Example | Priority | Notes | |--------|---------|----------|-------| | Markdown image | `![alt](https://xx/image.png)` | High | 直接替换为飞书 image_key | | HTML `` | `` | High | 相对路径需转绝对路径 | | Lazy-load image | `data-src`, `data-original` | Medium | 常见于公众号/博客懒加载 | | `srcset` candidate | `srcset="a 1x, b 2x"` | Medium | 优先选择清晰度更高的候选图 | | Feishu file token | `boxcn...` / `img_v3_...` | High | 需要走飞书素材下载后再上传 | ## Global Availability (全局可用配置) **生效范围:所有用户、所有群聊** 本技能已配置为全局可用,支持以下对象: | 对象类型 | 支持状态 | 说明 | |---------|---------|------| | **所有用户** | ✅ 可用 | 任何用户分享的链接均可被收录 | | **所有群聊** | ✅ 可用 | 支持技能中心群、养虾群、学习群等所有群组 | | **私聊消息** | ✅ 可用 | 用户私信分享链接也可触发收录 | | **多渠道** | ✅ 可用 | 飞书、其他渠道统一支持 | **权限说明:** - 任何用户均可触发收录(无需管理员权限) - 收录的文档统一存储到指定的知识库目录 - 所有用户均可查看已收录的文档 --- ## Installation & Permission Check (安装与权限检查) 在正式使用本技能前,系统必须自动或引导用户完成以下权限校验,以确保流程不中断: ### 1. 飞书权限清单 | 权限项 | 验证工具 | 目的 | |-------|---------|------| | **OAuth 授权** | `feishu_oauth` | 获取操作飞书文档和表格的用户凭证 | | **知识库写入权限** | `feishu_create_doc` | 确保能在指定的 Space ID 下创建节点 | | **多维表格编辑权限** | `feishu_bitable_app_table_record` | 确保能向指定的 app_token 写入记录 | | **图片上传权限** | `feishu_im_bot_upload` | 允许将本地图片同步至飞书素材库 | ### 2. 预检流程 (Pre-flight Check) 每次“安装”或配置更新后,执行以下检查: 1. **验证 Space ID 可访问性**:尝试在指定目录下获取节点列表。 2. **验证 Table 结构**:检查 `关键词`、`原链接`、`图片数量`、`图片处理状态` 等字段是否存在(后两者可选)。 3. **静默测试**:如果权限不足,立即通过 `feishu_oauth` 弹出授权引导,而非在执行收录时报错。 --- ## Configuration Before using, ensure these are configured in MEMORY.md: ```markdown ## Content Collector Config - **Knowledge Base Table**: `[Your Bitable App Token]` (Bitable app_token) - **Table URL**: [Your Bitable Table URL] - **Default Table ID**: `[Your Table ID]` (will auto-detect if available) - **Knowledge Base Space ID**: `[Your Space ID]` (所有文档创建在此知识库下) - **Knowledge Base URL**: [Your Knowledge Base Homepage URL] - **Content Categories**: 技术教程, 实战案例, 产品文档, 学习笔记 - **Global Access**: 所有用户可用,所有群聊可用 - **Image Fetch Mode**: `all` / `cover_only`(默认 `all`) - **Image Max Count**: `20`(单篇文档最多处理图片数) - **Image Max Size MB**: `10`(单图超过阈值则跳过) - **Image Timeout Sec**: `20`(下载超时) - **Image Allowed Types**: `jpg,png,gif,webp` - **Image Fallback**: `keep_original_link=true` ``` **Note**: 1. This skill updates ONLY the configured knowledge base table. Do not create or update any other tables. 2. **All created documents must be saved under the designated Knowledge Base** using wiki_node parameter. 3. **Global Access**: 所有用户、所有群聊均可使用本技能,收录的文档对全员可见。 4. 图片抓取默认开启;若用户明确要求“纯文字收录”,可跳过图片处理。 --- ## 📚 知识库文档存储规则(必遵守) 所有收录的文档必须按照以下规则分类存储到知识库对应目录: ### 知识库目录结构 请参考各项目或团队定义的知识库标准目录结构进行存储。收录的文档通常存放在“素材”或“归档”类目录下。 ### 文档分类映射规则 | 内容分类 | 存储目录 (wiki_node) | 命名前缀 | 示例 | |----------|---------------------|----------|------| | 技术教程 | `F9pFw9dxTiXmpsk5bNlco704nag` (内容文档) | 📖 | 📖 [标题] | | 实战案例 | `F9pFw9dxTiXmpsk5bNlco704nag` (内容文档) | 🛠️ | 🛠️ [标题] | | 产品文档 | `F9pFw9dxTiXmpsk5bNlco704nag` (内容文档) | 📄 | 📄 [标题] | | 学习笔记 | `F9pFw9dxTiXmpsk5bNlco704nag` (内容文档) | 💡 | 💡 [标题] | | 热点资讯 | `F9pFw9dxTiXmpsk5bNlco704nag` (内容文档) | 🔥 | 🔥 [标题] | | 设计技能 | `F9pFw9dxTiXmpsk5bNlco704nag` (内容文档) | 🎨 | 🎨 [标题] | | 工具推荐 | `F9pFw9dxTiXmpsk5bNlco704nag` (内容文档) | 🔧 | 🔧 [标题] | | 训练营 | `F9pFw9dxTiXmpsk5bNlco704nag` (内容文档) | 🎓 | 🎓 [标题] | ### 文档命名规范 ``` [Emoji前缀] [原标题] | 收录日期 示例: 📖 OpenClaw保姆级教程 | 2026-03-08 🛠️ 火山方舟自动化报表案例 | 2026-03-08 🔥 GPT-5.4发布解读 | 2026-03-08 ``` ### 文档模板 ```markdown # [Emoji] [原标题] > 📌 **元信息** > - 来源:[原始来源] > - 原文链接:[原始URL] > - 收录时间:YYYY-MM-DD > - 内容分类:[技术教程/实战案例/产品文档/学习笔记/热点资讯/设计技能/工具推荐/训练营] > - 关键词:[关键词1, 关键词2, 关键词3] --- ## 📋 核心要点 [3-5条核心内容摘要] --- ## 📝 正文内容 [完整的转存内容] --- ## 🔗 相关链接 - 原文链接:[原始URL] - 知识库索引:[素材池文档索引链接] --- 📚 **收录时间**:YYYY-MM-DD 🏷️ **分类**:[分类名] 🔖 **关键词**:[关键词] ``` ### 自动更新素材索引 每次收录完成后,必须: 1. **更新多维表格** - 添加新记录到素材池表格 2. **更新素材索引文档** - 在「📚 内容素材池文档索引」中添加条目 3. **更新分类统计** - 更新各分类的文档数量和占比 --- ## Workflow ### Step 1: Detect and Parse Link Extract URL from user message using regex or direct extraction. ### Step 2: Fetch Content (正文 + 原始结构) Choose appropriate fetch method based on URL pattern: **For WeChat articles:** ```python raw = kimi_fetch(url="https://mp.weixin.qq.com/s/xxx") ``` **For Feishu docs:** ```python raw = feishu_fetch_doc(doc_id="https://xxx.feishu.cn/docx/xxx") ``` **For general web pages:** ```python raw = kimi_fetch(url="https://example.com/article") # or raw = web_fetch(url="https://example.com/article") ``` **Standardized Output (必须统一):** ```python fetched = { "title": raw.get("title", ""), "markdown": raw.get("markdown", raw.get("content", "")), "raw_html": raw.get("html", ""), "source_url": original_url } ``` ### Step 3: Analyze and Categorize **智能分类判断:** 根据内容特征自动判断分类: | 判断依据 | 分类 | |----------|------| | 包含"安装/配置/部署/教程"等词 | 📖 技术教程 | | 包含"案例/实战/项目/演示"等词 | 🛠️ 实战案例 | | 包含"安全/公告/版本/功能"等词 | 📄 产品文档 | | 包含"学习/成长/指南/笔记"等词 | 💡 学习笔记 | | 包含"发布/新功能/热点"等词 | 🔥 热点资讯 | | 包含"设计/Prompt/美学"等词 | 🎨 设计技能 | | 包含"工具/CLI/插件"等词 | 🔧 工具推荐 | | 包含"训练营/课程/教学"等词 | 🎓 训练营 | ### Step 4: Process Images (图片处理) 在创建飞书文档前,必须执行图片抓取与回填,目标是“最大化保留原文图片、最小化失败影响正文”。 **Image Processing Workflow v2:** ```python import os import re from urllib.parse import urljoin IMG_MD_RE = re.compile(r'!\[(.*?)\]\(([^)]+)\)') IMG_HTML_RE = re.compile(r']+(?:src|data-src|data-original)=["\']([^"\']+)["\']', re.I) IMG_SRCSET_RE = re.compile(r']+srcset=["\']([^"\']+)["\']', re.I) def normalize_img_ref(ref: str, base_url: str) -> str: ref = (ref or "").strip() if not ref: return "" if ref.startswith(("http://", "https://")): return ref if ref.startswith("//"): return "https:" + ref # 飞书 token 或相对路径 if ref.startswith(("img_v3_", "boxcn", "file_", "AAM")): return ref return urljoin(base_url, ref) def pick_srcset_candidate(srcset_value: str) -> str: # 示例: "a.jpg 1x, b.jpg 2x" 或 "a.jpg 480w, b.jpg 1080w" parts = [x.strip() for x in (srcset_value or "").split(",") if x.strip()] if not parts: return "" return parts[-1].split(" ")[0].strip() def extract_image_candidates(markdown_content: str, raw_html: str, source_url: str): candidates = [] for alt, ref in IMG_MD_RE.findall(markdown_content or ""): candidates.append({"alt": alt or "image", "ref": normalize_img_ref(ref, source_url), "from_md": True}) for ref in IMG_HTML_RE.findall(raw_html or ""): normalized = normalize_img_ref(ref, source_url) if normalized: candidates.append({"alt": "image", "ref": normalized, "from_md": False}) for srcset_value in IMG_SRCSET_RE.findall(raw_html or ""): candidate = normalize_img_ref(pick_srcset_candidate(srcset_value), source_url) if candidate: candidates.append({"alt": "image", "ref": candidate, "from_md": False}) # 去重,保持首次出现顺序 seen, ordered = set(), [] for item in candidates: if item["ref"] and item["ref"] not in seen: seen.add(item["ref"]) ordered.append(item) return ordered def fetch_and_upload_images(markdown_content, raw_html, source_url, cfg): candidates = extract_image_candidates(markdown_content, raw_html, source_url) max_count = int(cfg.get("image_max_count", 20)) max_bytes = int(cfg.get("image_max_size_mb", 10)) * 1024 * 1024 replace_map = {} uploaded_extra = [] failed = [] total = 0 success = 0 for item in candidates[:max_count]: ref = item["ref"] total += 1 tmp_path = None try: # 1) 下载图片到本地临时文件 if ref.startswith(("http://", "https://")): tmp_path = download_image_to_local( ref, timeout=int(cfg.get("image_timeout_sec", 20)), max_bytes=max_bytes, allowed_types=cfg.get("image_allowed_types", "jpg,png,gif,webp") ) else: # 飞书 file_token / image_key tmp_path = download_feishu_media_to_local( ref, max_bytes=max_bytes, allowed_types=cfg.get("image_allowed_types", "jpg,png,gif,webp") ) # 2) 上传到飞书素材 upload_result = feishu_im_bot_upload(action="upload_image", file_path=tmp_path) image_key = upload_result.get("image_key") if not image_key: raise RuntimeError("empty image_key") success += 1 replace_map[ref] = image_key if not item["from_md"]: uploaded_extra.append((item["alt"], image_key)) except Exception as e: failed.append({"ref": ref, "reason": str(e)[:120]}) finally: if tmp_path and os.path.exists(tmp_path): os.remove(tmp_path) # 3) 回填 markdown 中已有图片 processed = markdown_content for src, image_key in replace_map.items(): processed = processed.replace(f"({src})", f"({image_key})") # 4) 将 HTML-only 图片追加到文末,避免丢图 if uploaded_extra: lines = ["", "---", "", "## 🖼️ 原文配图(自动抓取)", ""] for alt, image_key in uploaded_extra: lines.append(f"![{alt}]({image_key})") processed += "\n".join(lines) # 5) 失败兜底提示(不阻断收录) if failed: processed += ( "\n\n> ⚠️ 部分图片处理失败,已保留正文收录。" f" 成功 {success}/{total},失败 {len(failed)}。" ) image_stats = { "total": total, "success": success, "failed": len(failed), "failed_refs": failed } return processed, image_stats ``` **Fallback Strategy:** - 单张图片失败不影响整篇文档收录 - 原文中存在但未成功托管的图片,保留原链接并记录失败原因 - 文档末尾自动追加“原文配图/失败提示”区块,确保信息不丢失 ### Step 5: Create Feishu Document (按知识库规则存储) Convert processed markdown to Feishu document with proper organization: ```python # 0. 先执行图片处理 processed_markdown_content, image_stats = fetch_and_upload_images( markdown_content=fetched["markdown"], raw_html=fetched["raw_html"], source_url=fetched["source_url"], cfg=config ) # 1. 确定分类和参数 content_category = classify_content(processed_markdown_content) # 📖/🛠️/📄/💡/🔥/🎨/🔧/🎓 emoji_prefix = get_emoji_prefix(content_category) # 根据分类获取emoji wiki_node = get_wiki_node_by_category(content_category) # 获取存储目录 # 2. 生成文档标题 doc_title = f"{emoji_prefix} {original_title} | {today_date}" # 3. 生成文档内容(使用标准模板) doc_content = f"""# {emoji_prefix} {original_title} > 📌 **元信息** > - 来源:{source_name} > - 原文链接:{original_url} > - 收录时间:{today_date} > - 内容分类:{content_category} > - 关键词:{keywords} > - 图片处理:成功 {image_stats["success"]}/{image_stats["total"]}(失败 {image_stats["failed"]}) --- ## 📋 核心要点 {extract_key_points(processed_markdown_content, 5)} --- ## 📝 正文内容 {processed_markdown_content} --- ## 🔗 相关链接 - 原文链接:{original_url} - 知识库索引:[Your Index Document URL] --- 📅 **收录时间**:{today_date} 🏷️ **分类**:{content_category} 🔖 **关键词**:{keywords} """ # 4. 创建文档到知识库对应目录 feishu_create_doc( title=doc_title, markdown=doc_content, wiki_node=wiki_node # 必须指定存储目录 ) ``` **存储目录映射:** | 分类 | wiki_node | 目录名 | |------|-----------|--------| | 所有素材 | `F9pFw9dxTiXmpsk5bNlco704nag` | 04-内容素材 | **IMPORTANT**: 1. All documents MUST be created under the designated Knowledge Base using wiki_node parameter. 2. Documents must follow the naming convention: `[Emoji] [Title] | [Date]` 3. Documents must use the standard template with metadata section. ### Step 6: Update Knowledge Base Table Add record to the Bitable knowledge base (ONLY update this specific table): ```python feishu_bitable_app_table_record( action="create", app_token="[Your App Token]", # Configured in MEMORY.md table_id="[Your Table ID]", # Will use correct table ID from the base fields={ "关键词": keywords, "内容分类": content_category, "文档标题": [{"text": original_title, "type": "text"}], "来源": [{"text": source_name, "type": "text"}], "核心要点": [{"text": key_points, "type": "text"}], "飞书文档链接": {"link": new_doc_url, "text": "飞书文档", "type": "url"}, "原链接": {"link": original_url, "text": "原文链接", "type": "url"}, "图片数量": image_stats["total"], "图片处理状态": f'{image_stats["success"]}/{image_stats["total"]} 成功', "图片失败数": image_stats["failed"] } ) ``` **Table Fields:** | Field | Type | Description | |-------|------|-------------| | 关键词 | Text | Search keywords for the content | | 内容分类 | Single Select | Category: 📖技术教程/🛠️实战案例/📄产品文档/💡学习笔记/🔥热点资讯/🎨设计技能/🔧工具推荐/🎓训练营 | | 文档标题 | Text | Title of the archived document | | 来源 | Text | Original source name | | 核心要点 | Text | Key points summary (3-5 items) | | 飞书文档链接 | URL | Link to the created Feishu document | | 原链接 | URL | **Original source URL** - 新增字段,存储采集的原始链接 | | 图片数量 | Number | 本次检测到的图片总数 | | 图片处理状态 | Text | 图片托管结果,例如 `8/10 成功` | | 图片失败数 | Number | 上传失败图片数,用于质量监控 | **IMPORTANT**: Only update the configured knowledge base table. Never create or modify other tables. ### Step 7: Update Content Index Document After creating the document and updating the table, MUST update the index document: ```python # 1. 获取当前索引文档内容 index_doc = feishu_fetch_doc(doc_id="[Your Index Doc ID]") # 2. 在对应分类表格中添加新行 new_index_entry = f"| {original_title} | {source_name} | [查看]({new_doc_url}) |\n" # 3. 更新分类统计 update_category_stats(content_category) # 4. 更新总计数 update_total_count() ``` **或者直接追加到索引文档的末尾:** ```python feishu_update_doc( doc_id="[Your Index Doc ID]", mode="append", markdown=f""" | {original_title} | {source_name} | [查看]({new_doc_url}) | """ ) ``` --- ## Content Categorization Guide | Category | Emoji | Description | Examples | |----------|-------|-------------|----------| | **技术教程** | 📖 | Step-by-step technical guides | Installation, configuration, API usage | | **实战案例** | 🛠️ | Real-world implementation examples | Case studies, project demos | | **产品文档** | 📄 | Product features, security notices | Release notes, security advisories | | **学习笔记** | 💡 | Conceptual knowledge, methodologies | Best practices, architecture guides | | **热点资讯** | 🔥 | Breaking news, releases | GPT-5.4, new features | | **设计技能** | 🎨 | Design, prompts, aesthetics | AJ's prompts, design guides | | **工具推荐** | 🔧 | Tools, CLI, plugins | gws, trae, autotools | | **训练营** | 🎓 | Courses, bootcamps, tutorials | OpenClaw bootcamp | **分类判断优先级:** 1. 优先根据用户指定分类 2. 其次根据标题关键词 3. 最后根据内容特征自动判断 4. 不确定时标记为"待分类",请用户确认 ## Delete Record Process When user replies "删除" or "删除 [keyword]": ```python # 1. Search records by keyword feishu_bitable_app_table_record( action="list", app_token="[Your App Token]", table_id="[Your Table ID]", filter={ "conjunction": "and", "conditions": [ {"field_name": "关键词", "operator": "contains", "value": [keyword]} ] } ) # 2. Confirm deletion # If multiple found → list for user to select # If single found → ask for confirmation # 3. Execute deletion feishu_bitable_app_table_record( action="delete", app_token="[Your App Token]", table_id="[Your Table ID]", record_id="record_id_to_delete" ) ``` ## Error Handling ### Common Issues | Error | Cause | Solution | |-------|-------|----------| | Fetch timeout | Network issue or heavy content | Retry with longer timeout, or use alternative fetch method | | Unauthenticated | OAuth token expired or not authed | Trigger `feishu_oauth` to refresh user credentials | | Permission denied | No write access to Space/Table | Check if user/bot has 'Editor' role in Feishu | | Content too long | Exceeds API limits | Truncate or split into multiple documents | | Table update failed | Wrong app_token or table_id | Verify configuration in MEMORY.md | | Field Missing | "原链接" field not in table | Add the field to Bitable manually or via API | | Image download failed | Source anti-hotlinking / timeout | Retry with headers, then keep original link | | Image too large | Exceeds size limit | Compress or skip and log warning | | Invalid image type | Unsupported format or broken file | Skip image and continue document creation | ### Recovery Steps 1. If fetch fails → Try alternative method (kimi_fetch → web_fetch) 2. If image fetch/upload fails → Keep original image link and append warning block 3. If Feishu doc creation fails → Check OAuth status 4. If table update fails → Verify table structure and field names 5. Always report partial success (doc created but table not updated) ## Response Template ### 收录成功响应(流式Post格式) ```json { "msg_type": "post", "content": { "post": { "zh_cn": { "title": "✅ 收录完成", "content": [ [ {"tag": "text", "text": "📄 "}, {"tag": "text", "text": "{emoji} {原标题} | {日期}", "style": {"bold": true}} ], [{"tag": "text", "text": ""}], [ {"tag": "text", "text": "💡 文档亮点:", "style": {"bold": true}} ], [ {"tag": "text", "text": "• {亮点1}"} ], [ {"tag": "text", "text": "• {亮点2}"} ], [ {"tag": "text", "text": "• {亮点3}"} ], [{"tag": "text", "text": ""}], [ {"tag": "text", "text": "🔗 "}, {"tag": "a", "text": "查看飞书文档", "href": "{文档URL}"} ] ] } } } } ``` **简洁输出示例:** ``` ✅ 收录完成 📄 📖 OpenClaw配置指南 | 2026-03-08 💡 文档亮点: • 完整配置示例,含9大模块详解 • 多Agent扩展配置方案 • 生产环境安全配置建议 🔗 查看飞书文档 → [点击打开](https://xxx.feishu.cn/docx/xxx) ``` ### 静默收录响应(流式Post格式) ```json { "msg_type": "post", "content": { "post": { "zh_cn": { "title": "✅ 已自动收录", "content": [ [ {"tag": "text", "text": "📄 "}, {"tag": "text", "text": "{emoji} {原标题}", "style": {"bold": true}} ], [{"tag": "text", "text": ""}], [ {"tag": "text", "text": "💡 亮点:{亮点摘要}"} ], [{"tag": "text", "text": ""}], [ {"tag": "a", "text": "📎 查看文档", "href": "{文档URL}"} ] ] } } } } ``` ### 批量收录响应(流式Post格式) ```json { "msg_type": "post", "content": { "post": { "zh_cn": { "title": "✅ 批量收录完成({N}份)", "content": [ [ {"tag": "text", "text": "📄 {emoji1} {标题1}", "style": {"bold": true}} ], [ {"tag": "text", "text": " 💡 {亮点1}"} ], [ {"tag": "a", "text": " 🔗 查看", "href": "{链接1}"} ], [{"tag": "text", "text": ""}], [ {"tag": "text", "text": "📄 {emoji2} {标题2}", "style": {"bold": true}} ], [ {"tag": "text", "text": " 💡 {亮点2}"} ], [ {"tag": "a", "text": " 🔗 查看", "href": "{链接2}"} ] ] } } } } ``` **输出原则:** 1. **必须流式Post格式** - 使用 msg_type: post 2. **只包含3个核心要素:** - 文件名称(📄 Emoji + 标题 + 日期) - 文档亮点(💡 3-5条核心要点) - 飞书链接(🔗 点击查看) 3. **不输出其他信息** - 不显示分类、不显示表格更新、不显示统计 4. **保持简洁** - 每份文档3-5行内容 ## Best Practices 1. **Always verify content was fetched correctly** before creating documents 2. **Extract key insights** from the content for the summary 3. **Use appropriate category** based on content nature 4. **Generate relevant keywords** for better searchability 5. **Keep source attribution** clear for copyright respect 6. **Handle partial failures gracefully** - document what succeeded and what failed 7. **Update index document** - Every new document must be added to the index 8. **Follow naming convention** - Use [Emoji] [Title] | [Date] format 9. **Store in correct directory** - Use wiki_node to place in right category 10. **Image-first fallback** - 图片失败不阻断正文入库,优先保证知识沉淀完整性 ## 收录完成检查清单 (Checklist) 每次收录必须完成以下所有步骤: - [ ] **执行权限预检**(验证 OAuth 及 Space/Table 写入权限) - [ ] 获取并处理原始内容(含图片) - [ ] 抽取并去重图片引用(Markdown + HTML) - [ ] 图片托管到飞书(记录总数、成功数、失败数) - [ ] 智能分类并确定 Emoji 前缀 - [ ] 提取核心要点(3-5条) - [ ] 生成关键词 - [ ] **创建飞书文档**(使用标准模板,指定 wiki_node) - [ ] **更新多维表格**(添加完整记录,包含**原链接/图片统计**字段) - [ ] **更新文档索引**(在素材索引中添加条目) - [ ] 发送收录完成通知给用户 **任何一步未完成,视为收录失败!** ## Integration with Memory After each collection, update MEMORY.md: ```markdown ### YYYY-MM-DD - Content Collection - **新增收录**: [Title] - **来源**: [Source] - **分类**: [Category] - **知识库状态**: 共[N]条记录 - **索引更新**: ✅ 已更新 ``` This skill is part of the core knowledge management system. Execute with care and attention to detail. --- ## 附录:图片抓取能力 v2(执行约束) ### 必须满足的目标 1. **不丢图**:Markdown 图片 + HTML 图片都要尝试收录。 2. **不阻断**:图片失败不能阻断正文文档创建。 3. **可观测**:表格必须记录图片处理统计(总数/成功/失败)。 4. **可回溯**:失败图片需保留原始引用,便于后续补抓。 ### 推荐执行策略 1. 先用 `extract_image_candidates` 统一收集并去重。 2. 每篇文档最多处理 `image_max_count` 张,防止超时。 3. 单图限制 `image_max_size_mb`,超过阈值直接跳过并计入失败。 4. 仅允许常见格式(jpg/png/gif/webp),其余按失败处理。 5. 所有临时文件在 `finally` 中删除,避免磁盘残留。 ### 最小可行验收标准(MVP) - 输入含 10 张图的公众号文章,最终文档中至少 8 张能正常显示。 - 即使图片全部失败,也必须产出正文文档并回写表格记录。 - 收录响应中必须返回飞书文档链接,且不暴露内部异常堆栈。 --- *图片处理方案 v2.0 - 2026-03-17*