PIPELINE DASHBOARD
Collection & Deduplication Metrics
Real-time pipeline statistics from MalwareBazaar ingestion
TODAY'S COLLECTION
No collection run today. Awaiting next scheduled batch.
0
Total Collected
All-time ingested samples
0
Published (Unique)
Passed all dedup layers
0
Total Duplicates
Hash + Fuzzy + Semantic
0
Pending
Awaiting dedup processing
0
Families Identified
Distinct malware families
0
Failed
Pipeline errors
Deduplication Pipeline
Collected
0
100%
→
SHA256 Dedup
−0
0.0% removed
→
TLSH Fuzzy
−0
0.0% removed
→
Semantic
−0
0.0% removed
→
Published
0
0.0% pass rate
Sample Status
0
Published
0
Pending
0
Hash Dups
0
Fuzzy Dups
0
Semantic Dups
0
Failed
Daily Batch History
No batch data yet. Run the collection pipeline on the VM to populate this dashboard.
Pipeline: python -m src.collector.run
→ Dedup: python -m src.dedup.worker
Pipeline: MalwareBazaar → SHA256 dedup → TLSH fuzzy (distance ≤ 30) → Semantic cosine (≥ 0.85) → Published. Data refreshes on page load.