Video Perceptual Hashing Algorithm

This repository contains an implementation of a video perceptual hashing algorithm for scene-based video fingerprinting, along with a detailed article about the algorithm's inner workings.

Use Cases

Comparing uploaded videos to establish originality - Detect if a user has uploaded a modified version of existing content
Detecting rebroadcasts of an original source in real time - Monitor live streams for unauthorized content
Detecting/counting the number of scenes from other videos in a database - Identify composite videos that stitch together scenes from multiple sources
Finding which scenes specifically are copied in one video from another - Pinpoint exact copied segments between videos
Content deduplication - Identify near-duplicate videos in large media libraries
Copyright infringement detection - Find unauthorized copies even after modifications

Algorithm Details

Core Concepts

The algorithm operates at the scene level rather than on individual frames. A scene is defined as a continuous sequence of frames where the visual content remains relatively stable. Each scene generates a single perceptual hash.

Processing Pipeline

Scene Detection
- Video is sampled at keyframes (scene change boundaries)
- Uses adaptive thresholding based on color histogram differences
- Minimum scene duration: configurable (default 10 frames)
- Scene boundaries are detected when consecutive frame differences exceed dynamic thresholds
Key Frame Extraction
- For each detected scene, a representative frame is selected (typically the middle frame or first stable frame)
- Frame is resized to a standard dimension (e.g., 256x256) for consistent processing
Perceptual Hash Generation
- Image is converted to grayscale
- Discrete Cosine Transform (DCT) is applied
- Top-left 8x8 DCT coefficients (excluding DC) are extracted
- Comparison of coefficients against median creates 64-bit hash
- Result: 64-bit perceptual hash (pHash)
Hash Storage
- Scene start timestamp, end timestamp, and 64-bit hash are stored
- Multiple hash types can be combined (difference hash, average hash, perceptual hash)

Hash Distance Calculation

Hamming distance is used to compare hashes:

Distance = number of differing bits between two 64-bit hashes
Lower distance = more visually similar
Typical matching threshold: ≤15 bits difference

Usage

Installation

go get m8sh.su/x/vphash

Usage

package main

import (
    "fmt"
    "os"
    "time"
    
    "m8sh.su/d/gopeg"
    "m8sh.su/x/vphash"
)

func main() {
    f, _ := os.Open("video.mp4")
    defer f.Close()
    
    decoder, _ := gopeg.NewDecoder(f)
    defer decoder.Close()
    
    adapter := &decoderAdapter{decoder}
    
    // Scan with 10 seconds minimum scene length, 20 seconds maximum gap
    results := vphash.Scan(adapter, 10, 20)
    
    for scene := range results {
        fmt.Printf("Scene: %s to %s, Hash: %d\n", 
            scene.Start, scene.End, scene.Hash.GetHash())
    }
}

type decoderAdapter struct {
    d *gopeg.Decoder
}

func (a *decoderAdapter) Next() (image.Image, time.Duration, bool) {
    // Implementation for frame extraction
    // See usage example for complete implementation
}

API Reference

Core Functions

Scan(adapter FrameAdapter, minSceneLen, maxGap time.Duration) <-chan Entry
- Scans video and returns channel of scene entries
- adapter: Frame provider implementation
- minSceneLen: Minimum duration for a valid scene (seconds)
- maxGap: Maximum gap between frames to consider continuous

Types

type Entry struct
- Start time.Duration - Scene start timestamp
- End time.Duration - Scene end timestamp
- Hash *goimagehash.ImageHash - Perceptual hash of scene
type FrameAdapter interface
- Next() (image.Image, time.Duration, bool) - Returns next frame, timestamp, and whether more frames exist

4.0 KiB Raw Permalink Blame History

Video Perceptual Hashing Algorithm

Use Cases

Algorithm Details

Core Concepts

Processing Pipeline

Hash Distance Calculation

Usage

Installation

Usage

API Reference

Core Functions

Types

4.0 KiB

Raw Permalink Blame History