Baselight

Labeled Network Traffic Flows - 141 Applications

Unicauca Network Flows Dataset - 2019

@kaggle.jsrojas_labeled_network_traffic_flows_114_applications

Loading...
Loading...

About this Dataset

Labeled Network Traffic Flows - 141 Applications

Context

The data presented here was collected in the network from Universidad Del Cauca, Popayán, Colombia by performing packet captures at different hours, during morning and afternoon, over different days between April and June of 2019. A total of 2.704.839 flows instances were collected and are currently stored in a CSV (Comma Separated Values) file.

Content

This dataset contains 50 features. Each instance holds the information of an IP flow generated by a network device i.e., source and destination IP addresses, ports, flow durations, interarrival times, packet sizes and layer 7 protocol (application) used on that flow as the class. Most of the attributes are numeric type but there are also nominal types.

The dataset presented here was obtained by processing the different PCAP files with Flow Labeler, an application developed to perform the aggregation of packets into flows, the calculation of flow statistics and the labeling of the flows with their respective application through the nDPI library. Flow Labeler can be found in the following Github repository:

Flow Labeler: https://github.com/jsrojas/FlowLabeler

For further information and if you find this dataset useful, please read and cite the following papers:

IEEE Xplore: https://ieeexplore.ieee.org/document/9258898
Research Gate: https://www.researchgate.net/publication/345990587_Smart_User_Consumption_Profiling_Incremental_Learning-based_OTT_Service_Degradation

Acknowledgements

All the related works to this dataset were supported by the Ministry of science, technology and innovation (MinCiencias) from Colombia, through a National Doctoral Scholarship provided to the Juan Sebastian Rojas in the Call for National Doctorates under Grant 727-2015. Furthermore, a special thanks and recognition is provided to Universidad del Cauca and Dr. Adrian Pekar from MEDIANETS group at Budapest University of technology and Economics who supported and collaborated with the research and generation of this dataset.

Inspiration

Considering that it is really difficult to find useful network flows datasets holding a complete description of the communication and specially that includes the application label, I want to share this dataset to support the reproducibility of research in the domain of network traffic management.

Tables

Unicauca Dataset April June 2019 Network Flows

@kaggle.jsrojas_labeled_network_traffic_flows_114_applications.unicauca_dataset_april_june_2019_network_flows
  • 479.54 MB
  • 2704839 rows
  • 50 columns
Loading...

CREATE TABLE unicauca_dataset_april_june_2019_network_flows (
  "flow_key" VARCHAR,
  "src_ip_numeric" BIGINT,
  "src_ip" VARCHAR,
  "src_port" BIGINT,
  "dst_ip" VARCHAR,
  "dst_port" BIGINT,
  "proto" BIGINT,
  "pkttotalcount" BIGINT,
  "octettotalcount" BIGINT,
  "min_ps" BIGINT,
  "max_ps" BIGINT,
  "avg_ps" DOUBLE,
  "std_dev_ps" DOUBLE,
  "flowstart" DOUBLE,
  "flowend" DOUBLE,
  "flowduration" DOUBLE,
  "min_piat" DOUBLE,
  "max_piat" DOUBLE,
  "avg_piat" DOUBLE,
  "std_dev_piat" DOUBLE,
  "f_pkttotalcount" BIGINT,
  "f_octettotalcount" BIGINT,
  "f_min_ps" BIGINT,
  "f_max_ps" BIGINT,
  "f_avg_ps" DOUBLE,
  "f_std_dev_ps" DOUBLE,
  "f_flowstart" DOUBLE,
  "f_flowend" DOUBLE,
  "f_flowduration" DOUBLE,
  "f_min_piat" DOUBLE,
  "f_max_piat" DOUBLE,
  "f_avg_piat" DOUBLE,
  "f_std_dev_piat" DOUBLE,
  "b_pkttotalcount" BIGINT,
  "b_octettotalcount" BIGINT,
  "b_min_ps" BIGINT,
  "b_max_ps" BIGINT,
  "b_avg_ps" DOUBLE,
  "b_std_dev_ps" DOUBLE,
  "b_flowstart" DOUBLE,
  "b_flowend" DOUBLE,
  "b_flowduration" DOUBLE,
  "b_min_piat" DOUBLE,
  "b_max_piat" DOUBLE,
  "b_avg_piat" DOUBLE,
  "b_std_dev_piat" DOUBLE,
  "flowendreason" BIGINT,
  "category" VARCHAR,
  "application_protocol" VARCHAR,
  "web_service" VARCHAR
);

Share link

Anyone who has the link will be able to view this.