A new open standard for intelligent, self-describing, context-aware files that carry their own meaning.
Every file format created since 1970 shares a fundamental flaw: files are dumb containers. A PDF does not know it is a contract. A JPG does not know it is a medical scan. An MP3 does not know it is a song about grief.
Files rely entirely on external applications to give them meaning. Remove the application, and the file becomes meaningless bytes. This dependency creates fragility, lock-in, and a permanent loss of context over time.
.sema (from Semantic) is a new file format that embeds intelligence, context, and a self-rendering interface directly inside the file itself. No external application required. No API calls. No internet dependency.
A .sema file opens in any browser and understands itself.
A .sema file is a ZIP archive with a defined internal structure. ZIP was chosen for maximum compatibility, compression, and toolchain availability across all platforms.
All text files (JSON, HTML) must use DEFLATE compression level 6. Binary content files may use STORE (level 0) if already compressed (JPEG, MP4, etc.). Total uncompressed size limit for v1.0: 500 MB.
Every .sema file is built from four conceptual layers. The first three are mandatory.
sema.json. Describes what the file is, who made it, when, and why.brain.json. Keywords, entities, summary, relationships — generated at creation time, not at open time.view.html. Opens in any browser. No installation. No internet.The manifest is the identity card of the file. It must be valid JSON and must include all required fields.
// sema.json — Full specification { // ── REQUIRED FIELDS ───────────────────────────── "sema_version": "1.0.0", // spec version used "id": "sema_[uuid-v4]", // globally unique identifier "created_at": "2025-01-01T00:00:00Z", // ISO 8601 "content_type": "document/recipe", // category/subcategory "mime_type": "application/pdf", // original file MIME "filename": "harira-recipe.pdf", // original filename "lang": "ar", // ISO 639-1 language code // ── AUTHOR ─────────────────────────────────────── "author": { "name": "Larbi", "org": "TREN Studio", "contact": "optional" }, // ── CONTENT DESCRIPTION ────────────────────────── "title": "وصفة الحريرة المغربية", "description": "short human-readable description", "tags": ["recipe", "moroccan", "soup"], // ── TECHNICAL ──────────────────────────────────── "checksum": { "algo": "sha256", "value": "[hash of content/original.*]" }, "content_size_bytes": 204800, // ── OPTIONAL FIELDS ────────────────────────────── "expires_at": null, // null = never "geo": { "country": "MA", "region": "optional" }, "relations": [ { "type": "derived_from", "id": "sema_[other-id]" } ], "custom": {} // app-specific extra data }
The brain is the soul of .sema. It is computed once at creation time and stored permanently inside the file. It requires no API, no internet, and no processing at open time.
This is the key innovation: intelligence is pre-baked, not live-computed.
// brain.json — Semantic understanding layer { "brain_version": "1.0", "generated_at": "2025-01-01T00:00:00Z", "generator": "sema-builder-cli/1.0", // ── SEMANTIC CORE ──────────────────────────────── "summary": "2-3 sentence plain-language summary", "keywords": ["keyword1", "keyword2"], // top 10-20 "entities": { "people": [], "places": ["Morocco", "Marrakech"], "concepts": ["nutrition", "traditional food"], "dates": [] }, "topics": ["food", "culture", "health"], // ── CONTENT-TYPE SPECIFIC ──────────────────────── "content_data": { // Varies by content_type. Examples: // recipe → { ingredients, steps, time, calories } // invoice → { total, currency, items, due_date } // image → { objects, colors, scene, faces_count } // doc → { word_count, reading_time, headings } }, // ── SEARCHABILITY ──────────────────────────────── "search_text": "full extracted plain text for local search", "questions": [ // Pre-answered Q&A pairs for instant response { "q": "What is this file about?", "a": "..." }, { "q": "Who created this?", "a": "..." } ], // ── ACCESSIBILITY ──────────────────────────────── "alt_text": "description for screen readers", "translations": { "summary_ar": "ملخص بالعربية", "summary_fr": "résumé en français" } }
The questions array is the interaction layer. When a user types a question to the file, the viewer first searches this pre-computed array for a close match. If found, it answers instantly from local data. If not found, it falls back to keyword search within search_text. Zero API calls. Zero latency.
view.html is a complete, self-contained web application embedded inside the file. It is served locally by the .sema viewer or extracted and opened directly by any browser.
Every view.html must include:
The view.html must be completely self-contained: all CSS and JS inlined, no external CDN dependencies, no fetch() calls to external URLs. Maximum file size: 2 MB uncompressed. Must render correctly in Chrome, Firefox, Safari, and Edge without plugins.
The content_type field uses a category/subcategory format. The following are the v1.0 registered types:
| Content Type | Description | Key brain.json fields |
|---|---|---|
| document/generic | Any general document | word_count, reading_time, headings |
| document/recipe | Food recipe | ingredients, steps, calories, cook_time |
| document/invoice | Financial invoice | total, currency, items, due_date |
| document/contract | Legal contract | parties, clauses, dates, obligations |
| image/photo | Photograph | objects, scene, colors, location |
| image/medical | Medical imaging | modality, body_part, notes |
| image/diagram | Technical diagram | type, components, relationships |
| data/spreadsheet | Tabular data | rows, columns, summary_stats |
| data/dataset | Research dataset | variables, records, methodology |
| media/audio | Audio file | duration, transcript, speakers |
| media/video | Video file | duration, scenes, transcript |
| custom/[name] | App-specific type | defined by application |
| Capability | JPG/PNG | HTML | .sema | |
|---|---|---|---|---|
| Self-rendering | Partial | ✗ | ✓ | ✓ |
| No app needed | ✗ | ✗ | ✓ | ✓ |
| Semantic metadata | Limited | ✗ | ✗ | ✓ Rich |
| Queryable | ✗ | ✗ | ✗ | ✓ |
| Original preserved | ✓ | ✓ | ✗ | ✓ |
| No internet required | ✓ | ✓ | Partial | ✓ Always |
| Multilingual built-in | ✗ | ✗ | Manual | ✓ |
| Human + machine readable | Partial | ✗ | Partial | ✓ Both |