IP Network Traffic Flows Labeled With 75 Apps
Labeled IP flows with their Application Protocol
@kaggle.jsrojas_ip_network_traffic_flows_labeled_with_87_apps
Labeled IP flows with their Application Protocol
@kaggle.jsrojas_ip_network_traffic_flows_labeled_with_87_apps
The data presented here was collected in a network section from Universidad Del Cauca, Popayán, Colombia by performing packet captures at different hours, during morning and afternoon, over six days (April 26, 27, 28 and May 9, 11 and 15) of 2017. A total of 3.577.296 instances were collected and are currently stored in a CSV (Comma Separated Values) file.
This dataset contains 87 features. Each instance holds the information of an IP flow generated by a network device i.e., source and destination IP addresses, ports, interarrival times, layer 7 protocol (application) used on that flow as the class, among others. Most of the attributes are numeric type but there are also nominal types and a date type due to the Timestamp.
The flow statistics (IP addresses, ports, inter-arrival times, etc) were obtained using CICFlowmeter (http://www.unb.ca/cic/research/applications.html - https://github.com/ISCX/CICFlowMeter). The application layer protocol was obtained by performing a DPI (Deep Packet Inspection) processing on the flows with ntopng (https://www.ntop.org/products/traffic-analysis/ntop/ - https://github.com/ntop/ntopng).
For further information and if you find this dataset useful, please read and cite the following papers:
Springer:
https://link.springer.com/chapter/10.1007/978-3-319-95168-3_37
IEEExplore
https://ieeexplore.ieee.org/document/8845576
Research Gate:
https://www.researchgate.net/publication/345990587_Smart_User_Consumption_Profiling_Incremental_Learning-based_OTT_Service_Degradation
IEEExpore
https://ieeexplore.ieee.org/document/9258898
I would like to thank Universidad Del Cauca for supporting the research that generated this dataset and Colciencias for my PhD scholarship.
Considering that most of the network traffic classification datasets are aimed only at identifying the type of application an IP flow holds (WWW, DNS, FTP, P2P, Telnet,etc), this dataset goes a step further by generating machine learning models capable of detecting specific applications such as Facebook, YouTube, Instagram, etc, from IP flow statistics (currently 75 applications).
CREATE TABLE dataset_unicauca_version2_87atts (
"flow_id" VARCHAR,
"source_ip" VARCHAR,
"source_port" BIGINT,
"destination_ip" VARCHAR,
"destination_port" BIGINT,
"protocol" BIGINT,
"timestamp" VARCHAR,
"flow_duration" BIGINT,
"total_fwd_packets" BIGINT,
"total_backward_packets" BIGINT,
"total_length_of_fwd_packets" BIGINT,
"total_length_of_bwd_packets" DOUBLE,
"fwd_packet_length_max" BIGINT,
"fwd_packet_length_min" BIGINT,
"fwd_packet_length_mean" DOUBLE,
"fwd_packet_length_std" DOUBLE,
"bwd_packet_length_max" BIGINT,
"bwd_packet_length_min" BIGINT,
"bwd_packet_length_mean" DOUBLE,
"bwd_packet_length_std" DOUBLE,
"flow_bytes_s" DOUBLE,
"flow_packets_s" DOUBLE,
"flow_iat_mean" DOUBLE,
"flow_iat_std" DOUBLE,
"flow_iat_max" DOUBLE,
"flow_iat_min" BIGINT,
"fwd_iat_total" DOUBLE,
"fwd_iat_mean" DOUBLE,
"fwd_iat_std" DOUBLE,
"fwd_iat_max" DOUBLE,
"fwd_iat_min" DOUBLE,
"bwd_iat_total" DOUBLE,
"bwd_iat_mean" DOUBLE,
"bwd_iat_std" DOUBLE,
"bwd_iat_max" DOUBLE,
"bwd_iat_min" DOUBLE,
"fwd_psh_flags" BIGINT,
"bwd_psh_flags" BIGINT,
"fwd_urg_flags" BIGINT,
"bwd_urg_flags" BIGINT,
"fwd_header_length" BIGINT,
"bwd_header_length" BIGINT,
"fwd_packets_s" DOUBLE,
"bwd_packets_s" DOUBLE,
"min_packet_length" BIGINT,
"max_packet_length" BIGINT,
"packet_length_mean" DOUBLE,
"packet_length_std" DOUBLE,
"packet_length_variance" DOUBLE,
"fin_flag_count" BIGINT,
"syn_flag_count" BIGINT,
"rst_flag_count" BIGINT,
"psh_flag_count" BIGINT,
"ack_flag_count" BIGINT,
"urg_flag_count" BIGINT,
"cwe_flag_count" BIGINT,
"ece_flag_count" BIGINT,
"down_up_ratio" BIGINT,
"average_packet_size" DOUBLE,
"avg_fwd_segment_size" DOUBLE,
"avg_bwd_segment_size" DOUBLE,
"fwd_header_length_1" BIGINT,
"fwd_avg_bytes_bulk" BIGINT,
"fwd_avg_packets_bulk" BIGINT,
"fwd_avg_bulk_rate" BIGINT,
"bwd_avg_bytes_bulk" BIGINT,
"bwd_avg_packets_bulk" BIGINT,
"bwd_avg_bulk_rate" BIGINT,
"subflow_fwd_packets" BIGINT,
"subflow_fwd_bytes" BIGINT,
"subflow_bwd_packets" BIGINT,
"subflow_bwd_bytes" BIGINT,
"init_win_bytes_forward" BIGINT,
"init_win_bytes_backward" BIGINT,
"act_data_pkt_fwd" BIGINT,
"min_seg_size_forward" BIGINT,
"active_mean" DOUBLE,
"active_std" DOUBLE,
"active_max" DOUBLE,
"active_min" DOUBLE,
"idle_mean" DOUBLE,
"idle_std" DOUBLE,
"idle_max" DOUBLE,
"idle_min" DOUBLE,
"label" VARCHAR,
"l7protocol" BIGINT,
"protocolname" VARCHAR
);Anyone who has the link will be able to view this.