Files
vphash/README.md
T

120 lines
4.0 KiB
Markdown
Raw Normal View History

2026-06-07 00:22:19 +03:00
# Video Perceptual Hashing Algorithm
2026-06-06 21:10:31 +00:00
2026-06-07 00:22:19 +03:00
This repository contains an implementation of a video perceptual hashing algorithm for scene-based video fingerprinting, along with a detailed article about the algorithm's inner workings.
## Use Cases
- **Comparing uploaded videos to establish originality** - Detect if a user has uploaded a modified version of existing content
- **Detecting rebroadcasts of an original source in real time** - Monitor live streams for unauthorized content
- **Detecting/counting the number of scenes from other videos in a database** - Identify composite videos that stitch together scenes from multiple sources
- **Finding which scenes specifically are copied in one video from another** - Pinpoint exact copied segments between videos
- **Content deduplication** - Identify near-duplicate videos in large media libraries
- **Copyright infringement detection** - Find unauthorized copies even after modifications
## Algorithm Details
### Core Concepts
The algorithm operates at the scene level rather than on individual frames. A scene is defined as a continuous sequence of frames where the visual content remains relatively stable. Each scene generates a single perceptual hash.
### Processing Pipeline
1. **Scene Detection**
- Video is sampled at keyframes (scene change boundaries)
- Uses adaptive thresholding based on color histogram differences
- Minimum scene duration: configurable (default 10 frames)
- Scene boundaries are detected when consecutive frame differences exceed dynamic thresholds
2. **Key Frame Extraction**
- For each detected scene, a representative frame is selected (typically the middle frame or first stable frame)
- Frame is resized to a standard dimension (e.g., 256x256) for consistent processing
3. **Perceptual Hash Generation**
- Image is converted to grayscale
- Discrete Cosine Transform (DCT) is applied
- Top-left 8x8 DCT coefficients (excluding DC) are extracted
- Comparison of coefficients against median creates 64-bit hash
- Result: 64-bit perceptual hash (pHash)
4. **Hash Storage**
- Scene start timestamp, end timestamp, and 64-bit hash are stored
- Multiple hash types can be combined (difference hash, average hash, perceptual hash)
### Hash Distance Calculation
Hamming distance is used to compare hashes:
- Distance = number of differing bits between two 64-bit hashes
- Lower distance = more visually similar
- Typical matching threshold: ≤15 bits difference
## Usage
### Installation
2026-06-07 00:24:35 +03:00
```bash
2026-06-07 00:22:19 +03:00
go get m8sh.su/x/vphash
```
2026-06-07 00:24:35 +03:00
### Usage
2026-06-07 00:22:19 +03:00
2026-06-07 00:24:35 +03:00
```go
2026-06-07 00:22:19 +03:00
package main
import (
"fmt"
"os"
"time"
"m8sh.su/d/gopeg"
2026-06-07 00:22:19 +03:00
"m8sh.su/x/vphash"
)
func main() {
f, _ := os.Open("video.mp4")
defer f.Close()
decoder, _ := gopeg.NewDecoder(f)
defer decoder.Close()
adapter := &decoderAdapter{decoder}
// Scan with 10 seconds minimum scene length, 20 seconds maximum gap
results := vphash.Scan(adapter, 10, 20)
for scene := range results {
fmt.Printf("Scene: %s to %s, Hash: %d\n",
scene.Start, scene.End, scene.Hash.GetHash())
}
}
type decoderAdapter struct {
d *gopeg.Decoder
}
func (a *decoderAdapter) Next() (image.Image, time.Duration, bool) {
// Implementation for frame extraction
// See usage example for complete implementation
}
```
### API Reference
#### Core Functions
- `Scan(adapter FrameAdapter, minSceneLen, maxGap time.Duration) <-chan Entry`
- Scans video and returns channel of scene entries
- `adapter`: Frame provider implementation
- `minSceneLen`: Minimum duration for a valid scene (seconds)
- `maxGap`: Maximum gap between frames to consider continuous
#### Types
- `type Entry struct`
- `Start time.Duration` - Scene start timestamp
- `End time.Duration` - Scene end timestamp
- `Hash *goimagehash.ImageHash` - Perceptual hash of scene
- `type FrameAdapter interface`
- `Next() (image.Image, time.Duration, bool)` - Returns next frame, timestamp, and whether more frames exist