Baselight

Domain Generation Algorithm

Domain Generation Algorithm dataset

@kaggle.slashtea_domain_generation_algorithm

Loading...
Loading...

About this Dataset

Domain Generation Algorithm

This dataset has been collected from Alexa website ranking a blacklist of previous DGA domain names both sources are avaiblable within the provenance section.

The purpose is to build a classifier which can help us detect a potential machine infected by the DGA (Domain Generation Algorithm) malware.

Typically machines that are infected tend to generate a bunch of random domain names which will contain one active C&C server.

The image above depicts the overall approach of how DGA works. Thus our goal is to build a binomial classifier which can differentiate random domain names from legitimate ones.

Tables

Dga Project Top 1m

@kaggle.slashtea_domain_generation_algorithm.dga_project_top_1m
  • 8.04 MB
  • 491986 rows
  • 2 columns
Loading...

CREATE TABLE dga_project_top_1m (
  "n_1" BIGINT,
  "google_com" VARCHAR
);

Top 1m

@kaggle.slashtea_domain_generation_algorithm.top_1m
  • 11.21 MB
  • 694786 rows
  • 2 columns
Loading...

CREATE TABLE top_1m (
  "n_1" BIGINT,
  "google_com" VARCHAR
);

Share link

Anyone who has the link will be able to view this.