README.md

# Video Perceptual Hashing Algorithm

This repository contains an implementation of a video perceptual hashing algorithm for scene-based video fingerprinting, along with a detailed article about the algorithm's inner workings.

## Use Cases

- **Comparing uploaded videos to establish originality** - Detect if a user has uploaded a modified version of existing content
- **Detecting rebroadcasts of an original source in real time** - Monitor live streams for unauthorized content
- **Detecting/counting the number of scenes from other videos in a database** - Identify composite videos that stitch together scenes from multiple sources
- **Finding which scenes specifically are copied in one video from another** - Pinpoint exact copied segments between videos
- **Content deduplication** - Identify near-duplicate videos in large media libraries
- **Copyright infringement detection** - Find unauthorized copies even after modifications

## Algorithm Details

### Core Concepts

The algorithm operates at the scene level rather than on individual frames. A scene is defined as a continuous sequence of frames where the visual content remains relatively stable. Each scene generates a single perceptual hash.

### Processing Pipeline

1. **Scene Detection**
   - Video is sampled at keyframes (scene change boundaries)
   - Uses adaptive thresholding based on color histogram differences
   - Minimum scene duration: configurable (default 10 frames)
   - Scene boundaries are detected when consecutive frame differences exceed dynamic thresholds

2. **Key Frame Extraction**
   - For each detected scene, a representative frame is selected (typically the middle frame or first stable frame)
   - Frame is resized to a standard dimension (e.g., 256x256) for consistent processing

3. **Perceptual Hash Generation**
   - Image is converted to grayscale
   - Discrete Cosine Transform (DCT) is applied
   - Top-left 8x8 DCT coefficients (excluding DC) are extracted
   - Comparison of coefficients against median creates 64-bit hash
   - Result: 64-bit perceptual hash (pHash)

4. **Hash Storage**
   - Scene start timestamp, end timestamp, and 64-bit hash are stored
   - Multiple hash types can be combined (difference hash, average hash, perceptual hash)

### Hash Distance Calculation

Hamming distance is used to compare hashes:
- Distance = number of differing bits between two 64-bit hashes
- Lower distance = more visually similar
- Typical matching threshold: ≤15 bits difference

## Usage

### Installation

```bash
go get m8sh.su/x/vphash
```

### Usage

```go
package main

import (
    "fmt"
    "os"
    "time"
    
    "m8sh.su/d/gopeg"
    "m8sh.su/x/vphash"
)

func main() {
    f, _ := os.Open("video.mp4")
    defer f.Close()
    
    decoder, _ := gopeg.NewDecoder(f)
    defer decoder.Close()
    
    adapter := &decoderAdapter{decoder}
    
    // Scan with 10 seconds minimum scene length, 20 seconds maximum gap
    results := vphash.Scan(adapter, 10, 20)
    
    for scene := range results {
        fmt.Printf("Scene: %s to %s, Hash: %d\n", 
            scene.Start, scene.End, scene.Hash.GetHash())
    }
}

type decoderAdapter struct {
    d *gopeg.Decoder
}

func (a *decoderAdapter) Next() (image.Image, time.Duration, bool) {
    // Implementation for frame extraction
    // See usage example for complete implementation
}
```

### API Reference

#### Core Functions

- `Scan(adapter FrameAdapter, minSceneLen, maxGap time.Duration) <-chan Entry`
  - Scans video and returns channel of scene entries
  - `adapter`: Frame provider implementation
  - `minSceneLen`: Minimum duration for a valid scene (seconds)
  - `maxGap`: Maximum gap between frames to consider continuous

#### Types

- `type Entry struct`
  - `Start time.Duration` - Scene start timestamp
  - `End time.Duration` - Scene end timestamp  
  - `Hash *goimagehash.ImageHash` - Perceptual hash of scene

- `type FrameAdapter interface`
  - `Next() (image.Image, time.Duration, bool)` - Returns next frame, timestamp, and whether more frames exist
added implementation 2026-06-07 00:22:19 +03:00			`# Video Perceptual Hashing Algorithm`
Initial commit 2026-06-06 21:10:31 +00:00
added implementation 2026-06-07 00:22:19 +03:00			`This repository contains an implementation of a video perceptual hashing algorithm for scene-based video fingerprinting, along with a detailed article about the algorithm's inner workings.`

			`## Use Cases`

			`- Comparing uploaded videos to establish originality - Detect if a user has uploaded a modified version of existing content`
			`- Detecting rebroadcasts of an original source in real time - Monitor live streams for unauthorized content`
			`- Detecting/counting the number of scenes from other videos in a database - Identify composite videos that stitch together scenes from multiple sources`
			`- Finding which scenes specifically are copied in one video from another - Pinpoint exact copied segments between videos`
			`- Content deduplication - Identify near-duplicate videos in large media libraries`
			`- Copyright infringement detection - Find unauthorized copies even after modifications`

			`## Algorithm Details`

			`### Core Concepts`

			`The algorithm operates at the scene level rather than on individual frames. A scene is defined as a continuous sequence of frames where the visual content remains relatively stable. Each scene generates a single perceptual hash.`

			`### Processing Pipeline`

			`1. Scene Detection`
			`- Video is sampled at keyframes (scene change boundaries)`
			`- Uses adaptive thresholding based on color histogram differences`
			`- Minimum scene duration: configurable (default 10 frames)`
			`- Scene boundaries are detected when consecutive frame differences exceed dynamic thresholds`

			`2. Key Frame Extraction`
			`- For each detected scene, a representative frame is selected (typically the middle frame or first stable frame)`
			`- Frame is resized to a standard dimension (e.g., 256x256) for consistent processing`

			`3. Perceptual Hash Generation`
			`- Image is converted to grayscale`
			`- Discrete Cosine Transform (DCT) is applied`
			`- Top-left 8x8 DCT coefficients (excluding DC) are extracted`
			`- Comparison of coefficients against median creates 64-bit hash`
			`- Result: 64-bit perceptual hash (pHash)`

			`4. Hash Storage`
			`- Scene start timestamp, end timestamp, and 64-bit hash are stored`
			`- Multiple hash types can be combined (difference hash, average hash, perceptual hash)`

			`### Hash Distance Calculation`

			`Hamming distance is used to compare hashes:`
			`- Distance = number of differing bits between two 64-bit hashes`
			`- Lower distance = more visually similar`
			`- Typical matching threshold: ≤15 bits difference`

			`## Usage`

			`### Installation`

updated docs 2026-06-07 00:24:35 +03:00			```bash
added implementation 2026-06-07 00:22:19 +03:00			`go get m8sh.su/x/vphash`
			```

updated docs 2026-06-07 00:24:35 +03:00			`### Usage`
added implementation 2026-06-07 00:22:19 +03:00
updated docs 2026-06-07 00:24:35 +03:00			```go
added implementation 2026-06-07 00:22:19 +03:00			`package main`

			`import (`
			`"fmt"`
			`"os"`
			`"time"`

updated decoder link to point to working repo 2026-06-10 22:50:36 +03:00			`"m8sh.su/d/gopeg"`
added implementation 2026-06-07 00:22:19 +03:00			`"m8sh.su/x/vphash"`
			`)`

			`func main() {`
			`f, _ := os.Open("video.mp4")`
			`defer f.Close()`

			`decoder, _ := gopeg.NewDecoder(f)`
			`defer decoder.Close()`

			`adapter := &decoderAdapter{decoder}`

			`// Scan with 10 seconds minimum scene length, 20 seconds maximum gap`
			`results := vphash.Scan(adapter, 10, 20)`

			`for scene := range results {`
			`fmt.Printf("Scene: %s to %s, Hash: %d\n",`
			`scene.Start, scene.End, scene.Hash.GetHash())`
			`}`
			`}`

			`type decoderAdapter struct {`
			`d *gopeg.Decoder`
			`}`

			`func (a *decoderAdapter) Next() (image.Image, time.Duration, bool) {`
			`// Implementation for frame extraction`
			`// See usage example for complete implementation`
			`}`
			```

			`### API Reference`

			`#### Core Functions`

			- `Scan(adapter FrameAdapter, minSceneLen, maxGap time.Duration) <-chan Entry`
			`- Scans video and returns channel of scene entries`
			- `adapter`: Frame provider implementation
			- `minSceneLen`: Minimum duration for a valid scene (seconds)
			- `maxGap`: Maximum gap between frames to consider continuous

			`#### Types`

			- `type Entry struct`
			- `Start time.Duration` - Scene start timestamp
			- `End time.Duration` - Scene end timestamp
			- `Hash *goimagehash.ImageHash` - Perceptual hash of scene

			- `type FrameAdapter interface`
			- `Next() (image.Image, time.Duration, bool)` - Returns next frame, timestamp, and whether more frames exist