Name: IP Network Traffic Flows Labeled With 75 Apps
Creator: Kaggle
License: https://creativecommons.org/licenses/by/4.0/

About this Dataset

IP Network Traffic Flows Labeled With 75 Apps

Context

The data presented here was collected in a network section from Universidad Del Cauca, Popayán, Colombia by performing packet captures at different hours, during morning and afternoon, over six days (April 26, 27, 28 and May 9, 11 and 15) of 2017. A total of 3.577.296 instances were collected and are currently stored in a CSV (Comma Separated Values) file.

Content

This dataset contains 87 features. Each instance holds the information of an IP flow generated by a network device i.e., source and destination IP addresses, ports, interarrival times, layer 7 protocol (application) used on that flow as the class, among others. Most of the attributes are numeric type but there are also nominal types and a date type due to the Timestamp.

The flow statistics (IP addresses, ports, inter-arrival times, etc) were obtained using CICFlowmeter (http://www.unb.ca/cic/research/applications.html - https://github.com/ISCX/CICFlowMeter). The application layer protocol was obtained by performing a DPI (Deep Packet Inspection) processing on the flows with ntopng (https://www.ntop.org/products/traffic-analysis/ntop/ - https://github.com/ntop/ntopng).

For further information and if you find this dataset useful, please read and cite the following papers:

Research Gate:
https://www.researchgate.net/publication/326150046_Personalized_Service_Degradation_Policies_on_OTT_Applications_Based_on_the_Consumption_Behavior_of_Users

Research Gate:
https://www.researchgate.net/publication/335954240_Consumption_Behavior_Analysis_of_Over_The_Top_Services_Incremental_Learning_or_Traditional_Methods

Springer:
https://link.springer.com/chapter/10.1007/978-3-319-95168-3_37

IEEExplore
https://ieeexplore.ieee.org/document/8845576

Research Gate:
https://www.researchgate.net/publication/345990587_Smart_User_Consumption_Profiling_Incremental_Learning-based_OTT_Service_Degradation

IEEExpore
https://ieeexplore.ieee.org/document/9258898

Acknowledgements

I would like to thank Universidad Del Cauca for supporting the research that generated this dataset and Colciencias for my PhD scholarship.

Inspiration

Considering that most of the network traffic classification datasets are aimed only at identifying the type of application an IP flow holds (WWW, DNS, FTP, P2P, Telnet,etc), this dataset goes a step further by generating machine learning models capable of detecting specific applications such as Facebook, YouTube, Instagram, etc, from IP flow statistics (currently 75 applications).

Tables

Dataset Unicauca Version2–87atts

@kaggle.jsrojas_ip_network_traffic_flows_labeled_with_87_apps.dataset_unicauca_version2_87atts

752.03 MB
3,577,296 rows
87 columns

CREATE TABLE dataset_unicauca_version2_87atts (
  "flow_id" VARCHAR,
  "source_ip" VARCHAR,
  "source_port" BIGINT,
  "destination_ip" VARCHAR,
  "destination_port" BIGINT,
  "protocol" BIGINT,
  "timestamp" VARCHAR,
  "flow_duration" BIGINT,
  "total_fwd_packets" BIGINT,
  "total_backward_packets" BIGINT,
  "total_length_of_fwd_packets" BIGINT,
  "total_length_of_bwd_packets" DOUBLE,
  "fwd_packet_length_max" BIGINT,
  "fwd_packet_length_min" BIGINT,
  "fwd_packet_length_mean" DOUBLE,
  "fwd_packet_length_std" DOUBLE,
  "bwd_packet_length_max" BIGINT,
  "bwd_packet_length_min" BIGINT,
  "bwd_packet_length_mean" DOUBLE,
  "bwd_packet_length_std" DOUBLE,
  "flow_bytes_s" DOUBLE,
  "flow_packets_s" DOUBLE,
  "flow_iat_mean" DOUBLE,
  "flow_iat_std" DOUBLE,
  "flow_iat_max" DOUBLE,
  "flow_iat_min" BIGINT,
  "fwd_iat_total" DOUBLE,
  "fwd_iat_mean" DOUBLE,
  "fwd_iat_std" DOUBLE,
  "fwd_iat_max" DOUBLE,
  "fwd_iat_min" DOUBLE,
  "bwd_iat_total" DOUBLE,
  "bwd_iat_mean" DOUBLE,
  "bwd_iat_std" DOUBLE,
  "bwd_iat_max" DOUBLE,
  "bwd_iat_min" DOUBLE,
  "fwd_psh_flags" BIGINT,
  "bwd_psh_flags" BIGINT,
  "fwd_urg_flags" BIGINT,
  "bwd_urg_flags" BIGINT,
  "fwd_header_length" BIGINT,
  "bwd_header_length" BIGINT,
  "fwd_packets_s" DOUBLE,
  "bwd_packets_s" DOUBLE,
  "min_packet_length" BIGINT,
  "max_packet_length" BIGINT,
  "packet_length_mean" DOUBLE,
  "packet_length_std" DOUBLE,
  "packet_length_variance" DOUBLE,
  "fin_flag_count" BIGINT,
  "syn_flag_count" BIGINT,
  "rst_flag_count" BIGINT,
  "psh_flag_count" BIGINT,
  "ack_flag_count" BIGINT,
  "urg_flag_count" BIGINT,
  "cwe_flag_count" BIGINT,
  "ece_flag_count" BIGINT,
  "down_up_ratio" BIGINT,
  "average_packet_size" DOUBLE,
  "avg_fwd_segment_size" DOUBLE,
  "avg_bwd_segment_size" DOUBLE,
  "fwd_header_length_1" BIGINT,
  "fwd_avg_bytes_bulk" BIGINT,
  "fwd_avg_packets_bulk" BIGINT,
  "fwd_avg_bulk_rate" BIGINT,
  "bwd_avg_bytes_bulk" BIGINT,
  "bwd_avg_packets_bulk" BIGINT,
  "bwd_avg_bulk_rate" BIGINT,
  "subflow_fwd_packets" BIGINT,
  "subflow_fwd_bytes" BIGINT,
  "subflow_bwd_packets" BIGINT,
  "subflow_bwd_bytes" BIGINT,
  "init_win_bytes_forward" BIGINT,
  "init_win_bytes_backward" BIGINT,
  "act_data_pkt_fwd" BIGINT,
  "min_seg_size_forward" BIGINT,
  "active_mean" DOUBLE,
  "active_std" DOUBLE,
  "active_max" DOUBLE,
  "active_min" DOUBLE,
  "idle_mean" DOUBLE,
  "idle_std" DOUBLE,
  "idle_max" DOUBLE,
  "idle_min" DOUBLE,
  "label" VARCHAR,
  "l7protocol" BIGINT,
  "protocolname" VARCHAR
);