.sema Format Specification v1.0

§ 01

Philosophy & Problem Statement

Every file format created since 1970 shares a fundamental flaw: files are dumb containers. A PDF does not know it is a contract. A JPG does not know it is a medical scan. An MP3 does not know it is a song about grief.

Files rely entirely on external applications to give them meaning. Remove the application, and the file becomes meaningless bytes. This dependency creates fragility, lock-in, and a permanent loss of context over time.

"The file should carry its own soul. It should know what it is, what it means, and how to present itself — to anyone, anywhere, forever."

.sema (from Semantic) is a new file format that embeds intelligence, context, and a self-rendering interface directly inside the file itself. No external application required. No API calls. No internet dependency.

A .sema file opens in any browser and understands itself.

§ 02

File Structure

A .sema file is a ZIP archive with a defined internal structure. ZIP was chosen for maximum compatibility, compression, and toolchain availability across all platforms.

archive.sema ← renamed .zip

├── sema.json ← REQUIRED: manifest + metadata

├── view.html ← REQUIRED: self-rendering interface

├── brain.json ← REQUIRED: semantic understanding

├── content/ ← REQUIRED: original file(s)

│ └── original.* ← any format

├── assets/ ← optional: images, fonts

│ └── thumb.webp ← thumbnail preview

└── layers/ ← optional: extracted data layers

├── text.txt ← plain text extraction

└── structured.json ← parsed structured data

// Compression Rules

All text files (JSON, HTML) must use DEFLATE compression level 6. Binary content files may use STORE (level 0) if already compressed (JPEG, MP4, etc.). Total uncompressed size limit for v1.0: 500 MB.

§ 03

The Four Layers

Every .sema file is built from four conceptual layers. The first three are mandatory.

Content

The original file in its native format. Untouched, unmodified. The source of truth.

Manifest

Structured metadata in sema.json. Describes what the file is, who made it, when, and why.

Brain

Pre-computed semantic understanding in brain.json. Keywords, entities, summary, relationships — generated at creation time, not at open time.

View

A complete self-contained HTML+JS interface in view.html. Opens in any browser. No installation. No internet.

§ 04

sema.json — The Manifest

The manifest is the identity card of the file. It must be valid JSON and must include all required fields.

// sema.json — Full specification
{
  // ── REQUIRED FIELDS ─────────────────────────────

  "sema_version": "1.0.0",          // spec version used
  "id": "sema_[uuid-v4]",           // globally unique identifier
  "created_at": "2025-01-01T00:00:00Z", // ISO 8601
  "content_type": "document/recipe",  // category/subcategory
  "mime_type": "application/pdf",    // original file MIME
  "filename": "harira-recipe.pdf",  // original filename
  "lang": "ar",                     // ISO 639-1 language code

  // ── AUTHOR ───────────────────────────────────────
  "author": {
    "name": "Larbi",
    "org": "TREN Studio",
    "contact": "optional"
  },

  // ── CONTENT DESCRIPTION ──────────────────────────
  "title": "وصفة الحريرة المغربية",
  "description": "short human-readable description",
  "tags": ["recipe", "moroccan", "soup"],

  // ── TECHNICAL ────────────────────────────────────
  "checksum": {
    "algo": "sha256",
    "value": "[hash of content/original.*]"
  },
  "content_size_bytes": 204800,

  // ── OPTIONAL FIELDS ──────────────────────────────
  "expires_at": null,              // null = never
  "geo": {
    "country": "MA",
    "region": "optional"
  },
  "relations": [
    { "type": "derived_from", "id": "sema_[other-id]" }
  ],
  "custom": {}                      // app-specific extra data
}

§ 05

brain.json — Semantic Intelligence

The brain is the soul of .sema. It is computed once at creation time and stored permanently inside the file. It requires no API, no internet, and no processing at open time.

This is the key innovation: intelligence is pre-baked, not live-computed.

// brain.json — Semantic understanding layer
{
  "brain_version": "1.0",
  "generated_at": "2025-01-01T00:00:00Z",
  "generator": "sema-builder-cli/1.0",

  // ── SEMANTIC CORE ────────────────────────────────
  "summary": "2-3 sentence plain-language summary",
  "keywords": ["keyword1", "keyword2"],    // top 10-20
  "entities": {
    "people": [],
    "places": ["Morocco", "Marrakech"],
    "concepts": ["nutrition", "traditional food"],
    "dates": []
  },
  "topics": ["food", "culture", "health"],

  // ── CONTENT-TYPE SPECIFIC ────────────────────────
  "content_data": {
    // Varies by content_type. Examples:
    // recipe → { ingredients, steps, time, calories }
    // invoice → { total, currency, items, due_date }
    // image  → { objects, colors, scene, faces_count }
    // doc    → { word_count, reading_time, headings }
  },

  // ── SEARCHABILITY ────────────────────────────────
  "search_text": "full extracted plain text for local search",
  "questions": [
    // Pre-answered Q&A pairs for instant response
    { "q": "What is this file about?", "a": "..." },
    { "q": "Who created this?", "a": "..." }
  ],

  // ── ACCESSIBILITY ────────────────────────────────
  "alt_text": "description for screen readers",
  "translations": {
    "summary_ar": "ملخص بالعربية",
    "summary_fr": "résumé en français"
  }
}

// The Q&A Engine

The questions array is the interaction layer. When a user types a question to the file, the viewer first searches this pre-computed array for a close match. If found, it answers instantly from local data. If not found, it falls back to keyword search within search_text. Zero API calls. Zero latency.

§ 06

view.html — The Self-Rendering Interface

view.html is a complete, self-contained web application embedded inside the file. It is served locally by the .sema viewer or extracted and opened directly by any browser.

// Mandatory UI Elements

Every view.html must include:

Header bar — title, author, creation date, file type badge
Content area — renders the original content appropriately
Ask bar — text input to query the file (uses brain.json)
Summary panel — auto-shows the brain.json summary
Download button — allows extraction of original content
.sema badge — identifies the format with spec version

// Technical Constraints

The view.html must be completely self-contained: all CSS and JS inlined, no external CDN dependencies, no fetch() calls to external URLs. Maximum file size: 2 MB uncompressed. Must render correctly in Chrome, Firefox, Safari, and Edge without plugins.

§ 07

Content Type Registry

The content_type field uses a category/subcategory format. The following are the v1.0 registered types:

Content Type	Description	Key brain.json fields
document/generic	Any general document	word_count, reading_time, headings
document/recipe	Food recipe	ingredients, steps, calories, cook_time
document/invoice	Financial invoice	total, currency, items, due_date
document/contract	Legal contract	parties, clauses, dates, obligations
image/photo	Photograph	objects, scene, colors, location
image/medical	Medical imaging	modality, body_part, notes
image/diagram	Technical diagram	type, components, relationships
data/spreadsheet	Tabular data	rows, columns, summary_stats
data/dataset	Research dataset	variables, records, methodology
media/audio	Audio file	duration, transcript, speakers
media/video	Video file	duration, scenes, transcript
custom/[name]	App-specific type	defined by application

Capability	PDF	JPG/PNG	HTML	.sema
Self-rendering	Partial	✗	✓	✓
No app needed	✗	✗	✓	✓
Semantic metadata	Limited	✗	✗	✓ Rich
Queryable	✗	✗	✗	✓
Original preserved	✓	✓	✗	✓
No internet required	✓	✓	Partial	✓ Always
Multilingual built-in	✗	✗	Manual	✓
Human + machine readable	Partial	✗	Partial	✓ Both

§ 09

Use Cases

🍽️

Recipe Files (FoodJot)

Each recipe becomes a .sema file. Ask it about calories, substitutions, or steps. Embed in any site. Works forever.

🏥

Medical Records

Patient sends a scan .sema to a specialist across the world. Doctor opens it in a browser. No DICOM viewer needed.

📜

Legal Contracts

Ask the contract: "What is the penalty clause?" Get an instant answer from the embedded brain.

🎓

Academic Papers

Research papers that explain themselves. Ask: "What is the main finding?" — answered instantly, offline.

📦

Product Catalogs

An Amazon product review as .sema. Includes specs, pros/cons, affiliate links — all self-contained.

🗺️

Cultural Preservation

Archive Moroccan heritage documents with full semantic context in Arabic, French, and Amazigh — forever.

.sema
Semantic File
Format