Official Specification

.sema
Semantic File
Format

A new open standard for intelligent, self-describing, context-aware files that carry their own meaning.

Version 1.0.0-draft
Status ACTIVE DRAFT
Authors Larbi + Claude
License MIT OPEN
Extension .sema
§ 01

Philosophy & Problem Statement

Every file format created since 1970 shares a fundamental flaw: files are dumb containers. A PDF does not know it is a contract. A JPG does not know it is a medical scan. An MP3 does not know it is a song about grief.

Files rely entirely on external applications to give them meaning. Remove the application, and the file becomes meaningless bytes. This dependency creates fragility, lock-in, and a permanent loss of context over time.

"The file should carry its own soul. It should know what it is, what it means, and how to present itself — to anyone, anywhere, forever."

.sema (from Semantic) is a new file format that embeds intelligence, context, and a self-rendering interface directly inside the file itself. No external application required. No API calls. No internet dependency.

A .sema file opens in any browser and understands itself.

§ 02

File Structure

A .sema file is a ZIP archive with a defined internal structure. ZIP was chosen for maximum compatibility, compression, and toolchain availability across all platforms.

archive.sema ← renamed .zip
├── sema.json ← REQUIRED: manifest + metadata
├── view.html ← REQUIRED: self-rendering interface
├── brain.json ← REQUIRED: semantic understanding
├── content/ ← REQUIRED: original file(s)
│ └── original.* ← any format
├── assets/ ← optional: images, fonts
│ └── thumb.webp ← thumbnail preview
└── layers/ ← optional: extracted data layers
├── text.txt ← plain text extraction
└── structured.json ← parsed structured data

// Compression Rules

All text files (JSON, HTML) must use DEFLATE compression level 6. Binary content files may use STORE (level 0) if already compressed (JPEG, MP4, etc.). Total uncompressed size limit for v1.0: 500 MB.

§ 03

The Four Layers

Every .sema file is built from four conceptual layers. The first three are mandatory.

1
Content
The original file in its native format. Untouched, unmodified. The source of truth.
2
Manifest
Structured metadata in sema.json. Describes what the file is, who made it, when, and why.
3
Brain
Pre-computed semantic understanding in brain.json. Keywords, entities, summary, relationships — generated at creation time, not at open time.
4
View
A complete self-contained HTML+JS interface in view.html. Opens in any browser. No installation. No internet.
§ 04

sema.json — The Manifest

The manifest is the identity card of the file. It must be valid JSON and must include all required fields.

// sema.json — Full specification
{
  // ── REQUIRED FIELDS ─────────────────────────────

  "sema_version": "1.0.0",          // spec version used
  "id": "sema_[uuid-v4]",           // globally unique identifier
  "created_at": "2025-01-01T00:00:00Z", // ISO 8601
  "content_type": "document/recipe",  // category/subcategory
  "mime_type": "application/pdf",    // original file MIME
  "filename": "harira-recipe.pdf",  // original filename
  "lang": "ar",                     // ISO 639-1 language code

  // ── AUTHOR ───────────────────────────────────────
  "author": {
    "name": "Larbi",
    "org": "TREN Studio",
    "contact": "optional"
  },

  // ── CONTENT DESCRIPTION ──────────────────────────
  "title": "وصفة الحريرة المغربية",
  "description": "short human-readable description",
  "tags": ["recipe", "moroccan", "soup"],

  // ── TECHNICAL ────────────────────────────────────
  "checksum": {
    "algo": "sha256",
    "value": "[hash of content/original.*]"
  },
  "content_size_bytes": 204800,

  // ── OPTIONAL FIELDS ──────────────────────────────
  "expires_at": null,              // null = never
  "geo": {
    "country": "MA",
    "region": "optional"
  },
  "relations": [
    { "type": "derived_from", "id": "sema_[other-id]" }
  ],
  "custom": {}                      // app-specific extra data
}
§ 05

brain.json — Semantic Intelligence

The brain is the soul of .sema. It is computed once at creation time and stored permanently inside the file. It requires no API, no internet, and no processing at open time.

This is the key innovation: intelligence is pre-baked, not live-computed.

// brain.json — Semantic understanding layer
{
  "brain_version": "1.0",
  "generated_at": "2025-01-01T00:00:00Z",
  "generator": "sema-builder-cli/1.0",

  // ── SEMANTIC CORE ────────────────────────────────
  "summary": "2-3 sentence plain-language summary",
  "keywords": ["keyword1", "keyword2"],    // top 10-20
  "entities": {
    "people": [],
    "places": ["Morocco", "Marrakech"],
    "concepts": ["nutrition", "traditional food"],
    "dates": []
  },
  "topics": ["food", "culture", "health"],

  // ── CONTENT-TYPE SPECIFIC ────────────────────────
  "content_data": {
    // Varies by content_type. Examples:
    // recipe → { ingredients, steps, time, calories }
    // invoice → { total, currency, items, due_date }
    // image  → { objects, colors, scene, faces_count }
    // doc    → { word_count, reading_time, headings }
  },

  // ── SEARCHABILITY ────────────────────────────────
  "search_text": "full extracted plain text for local search",
  "questions": [
    // Pre-answered Q&A pairs for instant response
    { "q": "What is this file about?", "a": "..." },
    { "q": "Who created this?", "a": "..." }
  ],

  // ── ACCESSIBILITY ────────────────────────────────
  "alt_text": "description for screen readers",
  "translations": {
    "summary_ar": "ملخص بالعربية",
    "summary_fr": "résumé en français"
  }
}

// The Q&A Engine

The questions array is the interaction layer. When a user types a question to the file, the viewer first searches this pre-computed array for a close match. If found, it answers instantly from local data. If not found, it falls back to keyword search within search_text. Zero API calls. Zero latency.

§ 06

view.html — The Self-Rendering Interface

view.html is a complete, self-contained web application embedded inside the file. It is served locally by the .sema viewer or extracted and opened directly by any browser.

// Mandatory UI Elements

Every view.html must include:

// Technical Constraints

The view.html must be completely self-contained: all CSS and JS inlined, no external CDN dependencies, no fetch() calls to external URLs. Maximum file size: 2 MB uncompressed. Must render correctly in Chrome, Firefox, Safari, and Edge without plugins.

§ 07

Content Type Registry

The content_type field uses a category/subcategory format. The following are the v1.0 registered types:

Content Type Description Key brain.json fields
document/genericAny general documentword_count, reading_time, headings
document/recipeFood recipeingredients, steps, calories, cook_time
document/invoiceFinancial invoicetotal, currency, items, due_date
document/contractLegal contractparties, clauses, dates, obligations
image/photoPhotographobjects, scene, colors, location
image/medicalMedical imagingmodality, body_part, notes
image/diagramTechnical diagramtype, components, relationships
data/spreadsheetTabular datarows, columns, summary_stats
data/datasetResearch datasetvariables, records, methodology
media/audioAudio fileduration, transcript, speakers
media/videoVideo fileduration, scenes, transcript
custom/[name]App-specific typedefined by application
§ 08

Format Comparison

Capability PDF JPG/PNG HTML .sema
Self-rendering Partial
No app needed
Semantic metadata Limited ✓ Rich
Queryable
Original preserved
No internet required Partial ✓ Always
Multilingual built-in Manual
Human + machine readable Partial Partial ✓ Both
§ 09

Use Cases

🍽️
Recipe Files (FoodJot)
Each recipe becomes a .sema file. Ask it about calories, substitutions, or steps. Embed in any site. Works forever.
🏥
Medical Records
Patient sends a scan .sema to a specialist across the world. Doctor opens it in a browser. No DICOM viewer needed.
📜
Legal Contracts
Ask the contract: "What is the penalty clause?" Get an instant answer from the embedded brain.
🎓
Academic Papers
Research papers that explain themselves. Ask: "What is the main finding?" — answered instantly, offline.
📦
Product Catalogs
An Amazon product review as .sema. Includes specs, pros/cons, affiliate links — all self-contained.
🗺️
Cultural Preservation
Archive Moroccan heritage documents with full semantic context in Arabic, French, and Amazigh — forever.
§ 10

Development Roadmap

v1.0 — NOW
Foundation
Core spec, sema.json schema, brain.json schema, basic view.html, CLI builder tool, browser viewer.
v1.1
Builder Tools
Web-based .sema creator, drag & drop interface, Python and JS SDKs, WordPress plugin for FoodJot.
v1.2
Ecosystem
VSCode extension, .sema registry (index of public files), search engine for .sema files, API for builders.
v2.0
Intelligence Upgrade
Optional embedded micro-model (Transformers.js), versioned files, digital signatures, encrypted .sema.