Product Specification – LLMS.txt Generator Tool =============================================== Project Title: LLMS.txt Generator – AI-Friendly Website Summary Generator Overview: --------- The `llms.txt` file is a plain text document placed at the root of a website (e.g., https://example.com/llms.txt) to provide structured, AI-readable summaries of the site’s content. It is intended to guide Large Language Models (LLMs) such as ChatGPT, Claude, Gemini, and others by offering curated, human-written or tool-generated descriptions, documentation links, and page priorities. This enables LLMs to better understand and represent a website’s purpose, content, and features in user prompts and model-generated responses. Similar to how `robots.txt` instructs search engines, `llms.txt` informs LLMs how to summarize and reason about your site. It is especially useful for SaaS companies, APIs, open-source tools, or projects with technical documentation. By presenting high-value documentation and summaries, `llms.txt` enhances your visibility and accuracy within AI systems. ### Example llms.txt: MyAI Tools A toolkit for transformer-based inference and fine-tuning. Open-source utilities to simplify LLM development and deployment. Core Documentation Getting Started API Reference: Full API index Examples: Use case gallery Optional Blog: Updates from the team Community: Forums and links markdown Copy Edit ### llms.txt Structure: - Starts with a title using Markdown `#`. - A summary (1–2 lines) preceded by `>`. - Section `## Core Documentation` lists key URLs and descriptions. - Section `## Optional` includes secondary links like blog, contact, FAQ, changelog. Purpose: To provide an easy tool to generate this file either manually or automatically by scraping the homepage content, extracting title, meta description, and links, and formatting it into `llms.txt`. ================================================================================ Product Components: ------------------- 1. **Frontend (HTML/CSS/JavaScript)** ------------------------------------- - Single-page application (SPA) with three sections: - Home (What is llms.txt?) - Auto Generate (via PHP scraping) - Manual Generate (user enters title, summary, links) - Output area displays formatted `llms.txt` - Buttons for download as `.txt` file - Navigation controlled by JS tab-switching - Responsive and clean UI with blue color palette 2. **Backend (PHP)** -------------------- - PHP script (`scrape.php`) accepts a URL via `GET` - Uses `file_get_contents()` and `DOMDocument` to extract: - `` - `<meta name="description">` - Top 5 anchor links (`<a>`) with hrefs - Returns JSON to frontend with `title`, `description`, and list of links 3. **Security & Rate Limiting** ------------------------------- - Validate input URLs using `filter_var()` with `FILTER_VALIDATE_URL` - Reject URLs that are not HTTP/HTTPS - Use `libxml_use_internal_errors(true)` to prevent warnings - Optional: Limit requests by IP using PHP session/cookie or Redis-based rate limiter (e.g., max 10 requests per 5 min) - Sanitize link texts before output to avoid injection in preview 4. **Download Functionality** ----------------------------- - Client-side `Blob` is used to create a `.txt` file for download - Filename: `llms.txt` 5. **UX Flow** -------------- - User opens site → Reads about `llms.txt` - Switches to Auto or Manual tab - Enters URL or fills form - Clicks “Generate” - Output appears below with formatted content - Button allows “Download llms.txt” ================================================================================ Deployment Requirements: ------------------------ - PHP 7.4+ - Any static file server for `index.html` - No database required - Optional HTTPS for production hosting - Works on shared hosting, localhost, or VPS ================================================================================ Future Enhancements (Optional): ------------------------------- - Add custom tags or priorities in llms.txt - Allow uploading sitemaps to prepopulate link suggestions - Integrate with GitHub repo metadata - Support multi-page crawl (depth: 1) ================================================================================