Overview
Document Chat System is a full-stack, production-ready application that combines intelligent document management with AI-powered conversations. Upload documents in multiple formats, automatically process and index them, then have natural language conversations about your content using advanced AI models.
Key Features
🆓 100% Free & Open Source - MIT licensed. Deploy your own instance, modify as needed, or monetize as a SaaS.
🤖 Multi-Provider AI - Supports OpenRouter (100+ models), OpenAI, Anthropic, and ImageRouter. Uses gpt-4o-mini by default for cost-effective responses.
📄 Full Document Support - PDFs, DOCX, TXT, images with OCR, and more. Automatic text extraction and intelligent processing.
🔍 Semantic Search - Vector search with Pinecone or pgvector finds relevant content beyond simple keyword matching.
👥 Multi-Tenant Ready - Built-in organization isolation with complete data separation between users/organizations.
💳 Optional SaaS Billing - Integrated Stripe billing system with customizable pricing plans for monetization.
⚡ Background Processing - Inngest handles document processing, vectorization, and AI analysis asynchronously.
🎨 Modern UI - Beautiful, responsive interface with dark mode built using shadcn/ui and Tailwind CSS.
🔐 Enterprise Security - AES-256 encryption, Row-Level Security (RLS), and Clerk authentication.
🐳 Production Ready - Dockerfile included, deploy to Vercel/Railway/Render in minutes with one-click setup.
Features
📁 Document Management
Multi-Format Support: PDF, DOCX, TXT, MD, images, and more formats will be supported soon
Intelligent Processing: Automatic text extraction, OCR, metadata analysis
Folder Organization: Hierarchical folder structure with drag-and-drop
Batch Operations: Upload and process multiple files simultaneously
Real-Time Progress: Live updates on document processing status
Version Control: Track document versions and changes
File Sharing: Secure document sharing with permission controls
🤖 AI-Powered Chat
Multiple AI Providers:
OpenRouter: Access to 100+ models (GPT-4, Claude, Llama, Mistral, etc.)
OpenAI: Direct integration with GPT-4 Turbo and GPT-3.5
ImageRouter: Visual AI for image analysis and OCR
Document Context: AI understands and references your uploaded documents
Source Citations: Responses include references to source documents
Streaming Responses: Real-time token streaming for faster interactions
🔍 Advanced Search
Vector Search: Semantic search powered by Pinecone or PostgreSQL pgvector
Hybrid Search: Combines semantic similarity with keyword matching
Full-Text Search: Fast text search across all documents
Filters: Filter by date, type, folder, tags, and more
👥 Multi-Tenant Architecture
Organization Isolation: Complete data separation between organizations
Per-Org Resource Limits: Customizable limits per organization
Activity Tracking: Audit logs for compliance and security
⚡ Background Processing
Inngest Integration: Event-driven serverless functions
Document Processing Queue: Scalable batch processing
Automatic Retries: Built-in error handling and retries
Real-Time Notifications: Progress updates via webhooks
Job Monitoring: Track job status and logs
License
This project is licensed under the MIT License - see the LICENSE file for details.
What this means:
✅ Commercial use allowed
✅ Modification allowed
✅ Distribution allowed
✅ Private use allowed
❌ No liability
❌ No warranty