“Network-based high-level classification: new models and applications“
Machine learning is an application of artificial intelligence with a focus on developing computer programs that can access data and use it to learn on its own. High-level data classification is a technique based on pattern formation in the data, rather than just its physical characteristics. Complex networks have proven to be very useful for characterizing relationships between data samples, and consequently are a powerful mechanism for capturing data patterns. In this paper, new ways of using the network-based approach in the development of high-level classification techniques are investigated.
Initially, two classification techniques are introduced, and their performance is evaluated by applying them to reference datasets in the area, both artificial and real, as well as comparing their results with those obtained by traditional classification models on the same data. Subsequently, the inherent advantages of this type of approach, such as its versatility and interpretability, are exploited to develop new network-based techniques specifically designed to be applied to data from real and relevant problems in very diverse fields, from the financial market to the corruption of politicians and health care. While these types of applications certainly require more effort on the part of researchers, in terms of the challenge and pre-processing of the data, they are believed to be important in bringing academic research closer to reality.
Among the results obtained in this work is the detection of an unexpected relationship between bill voting data and convictions for corruption and other financial crimes among Brazilian deputies. It is also shown how it is possible to adapt a model, which was originally applied to detect periodicity in meteorological data, to identify bullish and bearish trends in the stock market, automatically triggering a buy or sell order for the asset, according to the situation. In another investigation, a technique is presented to assist healthcare professionals in the task of monitoring patients with COVID-19 by detecting prior signs of liver, kidney, or respiratory failure based on the results of the complete blood count test alone. In summary, it is believed that this work makes an important contribution to advancing the study of large-scale public data using complex networks.
Keywords: complex networks, high-level data classification, machine learning, political parties, legislative voting, corruption prediction, stock market, investment automation, COVID-19, insufficiency detection, CBC.
Publication Year: 2021
Student: Tiago Santos Colliri
Advisor: Prof. Dr. Rogério Henrique de Araújo Júnior
Program: Ph.D. in Computer Science and Computational Mathematics (PPG-CCMC)
University: Institute of Mathematics and Computer Science – University of São Paulo (Brazil)
More information: https://www.teses.usp.br/teses/disponiveis/55/55134/tde-26032021-102400/pt-br.php