Files

4.0 KiB

Video Perceptual Hashing Algorithm

This repository contains an implementation of a video perceptual hashing algorithm for scene-based video fingerprinting, along with a detailed article about the algorithm's inner workings.

Use Cases

  • Comparing uploaded videos to establish originality - Detect if a user has uploaded a modified version of existing content
  • Detecting rebroadcasts of an original source in real time - Monitor live streams for unauthorized content
  • Detecting/counting the number of scenes from other videos in a database - Identify composite videos that stitch together scenes from multiple sources
  • Finding which scenes specifically are copied in one video from another - Pinpoint exact copied segments between videos
  • Content deduplication - Identify near-duplicate videos in large media libraries
  • Copyright infringement detection - Find unauthorized copies even after modifications

Algorithm Details

Core Concepts

The algorithm operates at the scene level rather than on individual frames. A scene is defined as a continuous sequence of frames where the visual content remains relatively stable. Each scene generates a single perceptual hash.

Processing Pipeline

  1. Scene Detection

    • Video is sampled at keyframes (scene change boundaries)
    • Uses adaptive thresholding based on color histogram differences
    • Minimum scene duration: configurable (default 10 frames)
    • Scene boundaries are detected when consecutive frame differences exceed dynamic thresholds
  2. Key Frame Extraction

    • For each detected scene, a representative frame is selected (typically the middle frame or first stable frame)
    • Frame is resized to a standard dimension (e.g., 256x256) for consistent processing
  3. Perceptual Hash Generation

    • Image is converted to grayscale
    • Discrete Cosine Transform (DCT) is applied
    • Top-left 8x8 DCT coefficients (excluding DC) are extracted
    • Comparison of coefficients against median creates 64-bit hash
    • Result: 64-bit perceptual hash (pHash)
  4. Hash Storage

    • Scene start timestamp, end timestamp, and 64-bit hash are stored
    • Multiple hash types can be combined (difference hash, average hash, perceptual hash)

Hash Distance Calculation

Hamming distance is used to compare hashes:

  • Distance = number of differing bits between two 64-bit hashes
  • Lower distance = more visually similar
  • Typical matching threshold: ≤15 bits difference

Usage

Installation

go get m8sh.su/x/vphash

Usage

package main

import (
    "fmt"
    "os"
    "time"
    
    "m8sh.su/d/gopeg"
    "m8sh.su/x/vphash"
)

func main() {
    f, _ := os.Open("video.mp4")
    defer f.Close()
    
    decoder, _ := gopeg.NewDecoder(f)
    defer decoder.Close()
    
    adapter := &decoderAdapter{decoder}
    
    // Scan with 10 seconds minimum scene length, 20 seconds maximum gap
    results := vphash.Scan(adapter, 10, 20)
    
    for scene := range results {
        fmt.Printf("Scene: %s to %s, Hash: %d\n", 
            scene.Start, scene.End, scene.Hash.GetHash())
    }
}

type decoderAdapter struct {
    d *gopeg.Decoder
}

func (a *decoderAdapter) Next() (image.Image, time.Duration, bool) {
    // Implementation for frame extraction
    // See usage example for complete implementation
}

API Reference

Core Functions

  • Scan(adapter FrameAdapter, minSceneLen, maxGap time.Duration) <-chan Entry
    • Scans video and returns channel of scene entries
    • adapter: Frame provider implementation
    • minSceneLen: Minimum duration for a valid scene (seconds)
    • maxGap: Maximum gap between frames to consider continuous

Types

  • type Entry struct

    • Start time.Duration - Scene start timestamp
    • End time.Duration - Scene end timestamp
    • Hash *goimagehash.ImageHash - Perceptual hash of scene
  • type FrameAdapter interface

    • Next() (image.Image, time.Duration, bool) - Returns next frame, timestamp, and whether more frames exist