Topology and Network

The Topology and Network project started as a Summer Internship under Prof. Chris McCarty at BEBR and with Dr. Tom Smith. I had an idea to combine Topological techniques with the Network Science approach in analyzing collaboration networks.

We started with the last 20 years of research publication at the University of Florida and embedded all these publications in some space to begin analyzing the features. We used the Top2Vec model to embed them as Topic Embeddings using research publications as documents. Then we used Persistent Homology on the Topic Embeddings to get information about Topological Features.

Persistent Homology helps us to identify when a 2D (Dimensional) cycle comes into existence, aka "Birth," and when it fills up, aka "Death” as we increase the distance. For example, see the following figure where a 2D cycle form in the diagram at r = 1 and fills up at r = 2.

Persistent Diagram for Topic Embeddings represents Birth-Death of 1317 different 2D cycles in the Data. If you hover over the points you can see (Birth, Death) and class/size. Where "class" is the unique number given to a 2D cycle to identify it and "size" is the number of topics in the 2D cycle. We will study the Class-1317 2D cycle in detail because its persistence (Death - Birth) is highest.

The Topic Embeddings has around 750 topics with each topic represented by a 300 dimension vector. Visualizing a 2D Cycle-1317 in a 300 dimension plot is not possible, therefore we project our Topic Embeddings from 300 dimensions to 2D and 3D dimensions. This projection is made using UMAP, an advanced model used to get lower-dimensional projection of higher dimension data preserving the features of the data. The following plot represents 2D Cycle-1317 in 3D projection. You can click over the "point in no class" to remove points to see the 2D Cycle-1317 clearly in the plot.

We can also find different 2D cycles for the UMAP 2D projection of the data. Persistence Diagram for 2D projection has Birth-Death for 113 2D cycles. Click on this link if you want to see the 2D Cycle-113 and 2D Cycle-111 with a detailed analysis of the Topics they are representing.