Collection & Deduplication Metrics

Real-time pipeline statistics from MalwareBazaar ingestion

TODAY'S COLLECTION

No collection run today. Awaiting next scheduled batch.

0
Total Collected
All-time ingested samples
0
Published (Unique)
Passed all dedup layers
0
Total Duplicates
Hash + Fuzzy + Semantic
0
Pending
Awaiting dedup processing
0
Families Identified
Distinct malware families
0
Failed
Pipeline errors
Deduplication Pipeline
📥
Collected
0
100%
🔑
SHA256 Dedup
−0
0.0% removed
🔍
TLSH Fuzzy
−0
0.0% removed
🧬
Semantic
−0
0.0% removed
Published
0
0.0% pass rate
Sample Status
0
Published
0
Pending
0
Hash Dups
0
Fuzzy Dups
0
Semantic Dups
0
Failed
Daily Batch History

No batch data yet. Run the collection pipeline on the VM to populate this dashboard.

Pipeline: python -m src.collector.run  →  Dedup: python -m src.dedup.worker

Pipeline: MalwareBazaar → SHA256 dedup → TLSH fuzzy (distance ≤ 30) → Semantic cosine (≥ 0.85) → Published. Data refreshes on page load.