Designing and Prototyping a Typosquatting Detection Pipeline for Malicious Domains

Challenge

Fake websites facilitate fraud, identity theft, and coordinated disinformation. A prevalent tactic is typosquatting—registering domains visually or phonetically similar to legitimate ones (e.g., substitutions, transpositions, homoglyphs, look-alike TLDs). Despite existing countermeasures, there is no widely adopted, modular pipeline that (i) systematically generates candidate look-alike domains from topics/keywords, (ii) enriches them with authoritative Internet data, and (iii) scores risk in a transparent, reproducible way.

Objective

The thesis aims to develop a prototype pipeline for typosquatting detection. The work can involve:

  • developing a taxonomy and universal overview of standard typosquatting techniques (e.g., character substitutions, homoglyphs, transpositions),
  • designing a generator to create potential malicious domain names based on topics and keywords of interest,
  • integrating the generator with external data sources (e.g., ICANN WHOIS, RIPE NCC, DNS lookup) to check availability and detect active suspicious domains.

The ultimate objective is a modular and extensible pipeline that can be used for early detection and monitoring of typosquatting attacks.

Methodology & Expected Results

The work can be methodologically grounded in the Design Science Research (DSR) paradigm. Possible research artifacts include:

  • a taxonomy of typosquatting attack vectors,
  • a context-/type-sensitive domain name generator based on keywords and attack patterns,
  • a detection pipeline prototype that integrates external data sources,
  • optionally, a demonstrator or dashboard for visualization and monitoring.

An evaluation could be conducted by applying the pipeline to a set of target domains or real-world topics and measuring coverage, detection accuracy, and false positives.

Impact

Strengthens defenses against fraud, identity theft, and disinformation by enabling early warning and continuous monitoring. The taxonomy and open prototype can support security operations, OSINT workflows, brand protection, and research on coordinated manipulation—improving resilience of digital ecosystems.

Requirements for the Candidate

  • Working knowledge of Python and basic internet infrastructure (DNS, domains, TLDs).
  • Familiarity with string similarity and basic data processing.
  • Nice to have: experience with APIs (RDAP/WHOIS/DNS), Unicode/IDN handling, or simple ML.
  • Structured, independent work style; awareness of ethical/legal constraints (rate-limiting, ToS, privacy).