4.0 KiB
4.0 KiB
Video Perceptual Hashing Algorithm
This repository contains an implementation of a video perceptual hashing algorithm for scene-based video fingerprinting, along with a detailed article about the algorithm's inner workings.
Use Cases
- Comparing uploaded videos to establish originality - Detect if a user has uploaded a modified version of existing content
- Detecting rebroadcasts of an original source in real time - Monitor live streams for unauthorized content
- Detecting/counting the number of scenes from other videos in a database - Identify composite videos that stitch together scenes from multiple sources
- Finding which scenes specifically are copied in one video from another - Pinpoint exact copied segments between videos
- Content deduplication - Identify near-duplicate videos in large media libraries
- Copyright infringement detection - Find unauthorized copies even after modifications
Algorithm Details
Core Concepts
The algorithm operates at the scene level rather than on individual frames. A scene is defined as a continuous sequence of frames where the visual content remains relatively stable. Each scene generates a single perceptual hash.
Processing Pipeline
-
Scene Detection
- Video is sampled at keyframes (scene change boundaries)
- Uses adaptive thresholding based on color histogram differences
- Minimum scene duration: configurable (default 10 frames)
- Scene boundaries are detected when consecutive frame differences exceed dynamic thresholds
-
Key Frame Extraction
- For each detected scene, a representative frame is selected (typically the middle frame or first stable frame)
- Frame is resized to a standard dimension (e.g., 256x256) for consistent processing
-
Perceptual Hash Generation
- Image is converted to grayscale
- Discrete Cosine Transform (DCT) is applied
- Top-left 8x8 DCT coefficients (excluding DC) are extracted
- Comparison of coefficients against median creates 64-bit hash
- Result: 64-bit perceptual hash (pHash)
-
Hash Storage
- Scene start timestamp, end timestamp, and 64-bit hash are stored
- Multiple hash types can be combined (difference hash, average hash, perceptual hash)
Hash Distance Calculation
Hamming distance is used to compare hashes:
- Distance = number of differing bits between two 64-bit hashes
- Lower distance = more visually similar
- Typical matching threshold: ≤15 bits difference
Usage
Installation
go get m8sh.su/x/vphash
Usage
package main
import (
"fmt"
"os"
"time"
"m8sh.su/d/gopeg"
"m8sh.su/x/vphash"
)
func main() {
f, _ := os.Open("video.mp4")
defer f.Close()
decoder, _ := gopeg.NewDecoder(f)
defer decoder.Close()
adapter := &decoderAdapter{decoder}
// Scan with 10 seconds minimum scene length, 20 seconds maximum gap
results := vphash.Scan(adapter, 10, 20)
for scene := range results {
fmt.Printf("Scene: %s to %s, Hash: %d\n",
scene.Start, scene.End, scene.Hash.GetHash())
}
}
type decoderAdapter struct {
d *gopeg.Decoder
}
func (a *decoderAdapter) Next() (image.Image, time.Duration, bool) {
// Implementation for frame extraction
// See usage example for complete implementation
}
API Reference
Core Functions
Scan(adapter FrameAdapter, minSceneLen, maxGap time.Duration) <-chan Entry- Scans video and returns channel of scene entries
adapter: Frame provider implementationminSceneLen: Minimum duration for a valid scene (seconds)maxGap: Maximum gap between frames to consider continuous
Types
-
type Entry structStart time.Duration- Scene start timestampEnd time.Duration- Scene end timestampHash *goimagehash.ImageHash- Perceptual hash of scene
-
type FrameAdapter interfaceNext() (image.Image, time.Duration, bool)- Returns next frame, timestamp, and whether more frames exist