Baselight

AutoXplorer Dataset

Considering the increasing amount of information produced in digital marketplace

@kaggle.willianoliveiragibin_autoxplorer_dataset

About this Dataset

AutoXplorer Dataset


Considering the increasing amount of information produced in digital marketplaces, the growing adhesion to these types of services by internet users in Brazil and the world, and the lack of work related to this topic, this research aims to experiment with dataset extraction ads. After analyzing the main e-commerce spaces in Brazil, the chosen marketplaces were: Mercado Livre, Facebook, and OLX. Python's programming language uses the following libraries: scrappy, beautifulsoup, and Selenium Webdriver. After analyzing the web structure of the ad results pages, scripts were created to extract the main variables of the ad within a common category among the marketplaces. The results show that scrapers can remove datasets from advertisements on these platforms in different formats. Such information has potential for exploration in various segments of data science.

Castillo, B. A. V. (2020). Desarrollo de sistema de análisis de empleabilidad en portales web de empleos. Escuela Politécnica Nacional: ECUADOR. Disponível em https://bibdigital.epn.edu.ec/handle/15000/21177

CETIC.BR. (2019). Pesquisa sobre o uso das Tecnologias de Informação e Comunicação nos domicílios brasileiros - TIC Domicílios 2019. Disponível em: https://cetic.br/pt/publicacoes/indice/pesquisas/

CRUMMY. (2020). Beautiful Soup Documentation for Python. Disponível em: https://www.crummy.com/software/BeautifulSoup/bs4/doc/

ECOMMERCEBRASIL. (2021). E-commerce brasileiro cresce 73,88% em 2020, revela índice MCC-ENET. 2021. Disponível em https://www.ecommercebrasil.com.br/noticias/e-commerce-brasileiro-cresce-dezembro/

Fiesler, C., Beard, N., & Keegan, B. C. (2020). No Robots, Spiders, or Scrapers: Legal and Ethical Regulation of Data Collection Methods in Social Media Terms of Service. Proceedings of the International AAAI Conference on Web and Social Media, 14(1), 187-196. Retrieved from https://ojs.aaai.org/index.php/ICWSM/article/view/7290

Gerhardt, T. E., & Silveira, D. T. (2009). Métodos de pesquisa. Plageder.

Fathalla, A., Salah, A., Li, K., Li, K., & Francesco, P. (2020). Deep end-to-end learning for price prediction of second-hand items. Knowledge and Information Systems, 62(12), 4541-4568. https://doi.org/10.1007/s10115-020-01495-8

OLX. (2021). Institucional: Quem somos. 2021. Disponível em: https://portalolx.olx.com.br/quem-somos/

Pandey, A. Car’s Selling Price Prediction using Random Forest Machine Learning Algorithm. Março de 2020. 5th International Conference on Next Generation Computing Technologies (NGCT-2019). http://dx.doi.org/10.2139/ssrn.3702236

SCRAPY. (2021). An open source and collaborative framework for extracting the data you need from websites. Disponível em: https://scrapy.org/

Thivaharan, S., Srivatsun, G., & Sarathambekai, S. (2020, September). A survey on python libraries used for social media content scraping. In 2020 International Conference on Smart Electronics and Communication (ICOSEC) (pp. 361-366). https://doi.org/10.1109/ICOSEC49089.2020.9215357

TRENDS. (2021). Google Trends. Pesquisas relacionadas a marketplaces OLX, Facebook e Mercado Livre.

Wijaya, D. R., Paramita, N. L. P. S. P., Uluwiyah, A., Rheza, M., Zahara, A., & Puspita, D. R. (2020). Estimating city-level poverty rate based on e-commerce data with machine learning. Electronic Commerce Research, 1-27. http://dx.doi.org/10.1007/s10660-020-09424-1

Xu, Q., Cai, M., & Mackey, T. K. (2020). The illegal wildlife digital market: an analysis of Chinese wildlife marketing and sale on Facebook. Environmental Conservation, 47(3), 206-212. http://dx.doi.org/10.1017/S0376892920000235

Zaheer, M. S. Random Forest Regression on OLX’s Dataset. 2018. Medium.com. Disponível em: https://medium.com/@msz991/random-forest-regression-on-olxs-dataset-5d108f027257![](https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F16731800%2F157cbc071f6df7e6a527708140c80276%2FRplot01.png?generation=1705437358701885&alt=media)

Share link

Anyone who has the link will be able to view this.