pacman :: p_load (jsonlite, lubridate, tidygraph, ggraph, visNetwork, tidyverse,
igraph,heatmaply,hrbrthemes,treemap,devtools,
ggstatsplot,RColorBrewer, knitr,stringr)
options(scipen = 999)Take Home Exercise 2
Mini Case 2 of Vast Challenge 2023
1. OVERVIEW
FishEye International is collaborating with the country of Oceanus to identify companies who could potentially engaged in illegal, unreported, and unregulated (IUU) fishing. FishEye has transformed import/export data into a knowledge graph.
The country of Oceanus has sought FishEye International’s help in identifying companies possibly engaged in illegal, unreported, and unregulated (IUU) fishing. 12 groups of link suggestions with various fish types are used to reason on the knowledge graph.
1.1 The Task
In this take-home exercise, temporal patterns for individual entries and between entities are identified using the knowledge graph FishEye created from trade records. In addition, we evaluate the sets of predicted knowledge graph links (Bundles) to determine which sets are more reliable in completing the graph.
2. Datasets
The trade data is stored in mc2_challenge_graph.json file with a time period of 7 years from the time period 2028 to 2024. There are a total of 34,576 nodes and 5,464,378 edges in the knowledge graphs. It includes a bundles with 12 types of marine species, ranging from Carp, Catfish, Chub_mackerel, Cod2, Herring, Lichen, Mackerel, Pollock, Salmon_wgl, Salmon, Shark, and Tuna.
2.1 Metadata
| Location | Variables Name | Description |
|---|---|---|
| Node | id | Name of the company that originated (or received) the shipment |
| Node | shpcountry | Country the company most often associated with when shipping |
| Node | rcvcountry | Country the company most often associated with when receiving |
| Node, Edge | dataset | Always ’MC2 |
| Edge | arrivaldate | Date the shipment arrived at port in YYYY-MM-DD format. |
| Edge | hscode | Harmonized System code for the shipment. |
| Edge | valueofgoods_omu | Customs-declared value of the total shipment, in Oceanus Monetary Units (OMU) |
| Edge | volumeteu | The volume of the shipment in ‘Twenty-foot equivalent units’ |
| Edge | weightkg | The weight of the shipment in kilograms |
| Edge | type | Always ‘shipment’ for MC2 |
| Edge | generated_by | Name of the program that generated the edge (only in bundles) |
HS code, also known as The Harmonized System are alphanumeric codes used for classifying goods for international trade and customs purposes. It composed of six digits and could be broken down into chapter/heading/subheading (two digits each).
3. Data Preparation
3.1 Install R-packages
Using p_load() of pacman package to load and install the following libraries:
jsonlite: To import data from JSON File into Rlubridate: To convert Date and TimevisNetwork: For Network Visualizationtidyverse: A collection of R packages use in everyday data analyses. It is able to support data science, data wrangling, and analysis.hrbrthemes: For Additional Themes, and Utilities for ‘ggplot2’ (might not use)heatmaply: For creating Interactive Cluster Heatmapstreemap: For viisualizing hierarchical data using nested rectanglesdevtools: For the installing ford3treeRigraph: For exploring the networkRColorBrewer: For visualization. Contains ready-to-use color palettesknitr: For dynamic report generationstringr: For character manipulation.
3.2 Importing Data
The JSON files will be imported into R with the use of fromJSON function from jsonlite. The code chunk below shows the knowledge graph FishEye created from trade records.
MC2_challenge <- fromJSON("data/mc2_challenge_graph.json")The bundles which consists of Carp, Catfish, Chub_Mackerel, Cod2, Herring, Lichen, Mackerel, Pollock, Salmon_wgl, Salmon, Shark and Tuna are imported.
Show the code
MC2_carp <- fromJSON("data/bundles/carp.json")
MC2_catfish <- fromJSON("data/bundles/catfish.json")
MC2_chub_mackerel <- fromJSON("data/bundles/chub_mackerel.json")
MC2_cod2 <- fromJSON("data/bundles/cod2.json")
MC2_herring <- fromJSON("data/bundles/herring.json")
MC2_lichen <- fromJSON("data/bundles/lichen.json")
MC2_mackerel <- fromJSON("data/bundles/mackerel.json")
MC2_pollock <- fromJSON("data/bundles/pollock.json")
MC2_salmon_wgl <- fromJSON("data/bundles/salmon_wgl.json")
MC2_salmon <- fromJSON("data/bundles/salmon.json")
MC2_shark <- fromJSON("data/bundles/shark.json")
MC2_tuna <- fromJSON("data/bundles/tuna.json")3.3 Create Tibble Data frame
As the imported data is in JSON format, we will use the as_tibble to create a tibble from data.
MC2_challenge_nodes <-as_tibble(MC2_challenge$nodes) %>%
select(id,shpcountry,rcvcountry)
MC2_challenge_edges <-as_tibble(MC2_challenge$links) %>%
select(source,target,arrivaldate, hscode,valueofgoods_omu,
volumeteu, weightkg, valueofgoodsusd)Additionally, the columns have been re-shuffled for the bundles. We used relocate to revise the order. The column will start from source, to target, etc. By doing so, every fish types have the same sequence.
Show the code
#1_fish type :carp
MC2_carp_nodes <-as_tibble(MC2_carp$nodes) %>%
select(id,dataset,shpcountry,rcvcountry)
MC2_carp_edges <-as_tibble(MC2_carp$links) %>%
relocate(8,9,7,6)
#2_fish type: catfish
MC2_catfish_nodes <-as_tibble(MC2_catfish$nodes) %>%
select(id,dataset,shpcountry,rcvcountry)
MC2_catfish_edges <-as_tibble(MC2_catfish$links) %>%
relocate(6,7,5,4)
#3_fish type: chub_mackerel
MC2_chub_mackerel_nodes <-as_tibble(MC2_chub_mackerel$nodes) %>%
select(id,dataset,shpcountry,rcvcountry)
MC2_chub_mackerel_edges <-as_tibble(MC2_chub_mackerel$links) %>%
relocate(8,9,7,6)
#4_fish type: cod2
MC2_cod2_nodes <-as_tibble(MC2_cod2$nodes) %>%
select(id,dataset,shpcountry,rcvcountry)
MC2_cod2_edges <-as_tibble(MC2_cod2$links) %>%
relocate(8,9,7,6)
#5_fish type: herring
MC2_herring_nodes <-as_tibble(MC2_herring$nodes) %>%
select(id,dataset,shpcountry,rcvcountry)
MC2_herring_edges <-as_tibble(MC2_herring$links) %>%
relocate(7,8,6,5,1,2,9,3,4)
#6_fish type: lichen
MC2_lichen_nodes <-as_tibble(MC2_lichen$nodes) %>%
select(id,dataset,shpcountry,rcvcountry)
MC2_lichen_edges <-as_tibble(MC2_lichen$links) %>%
relocate(7,8,6,5,1,2,9,3,4)
#7_fish type: mackerel
MC2_mackerel_nodes <-as_tibble(MC2_mackerel$nodes) %>%
select(id,dataset,shpcountry,rcvcountry)
MC2_mackerel_edges <-as_tibble(MC2_mackerel$links) %>%
relocate(6,7,5,4)
#8_fish type: pollock
MC2_pollock_nodes <-as_tibble(MC2_pollock$nodes) %>%
select(id,dataset,shpcountry,rcvcountry)
MC2_pollock_edges <-as_tibble(MC2_pollock$links) %>%
relocate(8,9,7,6)
#9_fish type: salmon_wgl
MC2_salmon_wgl_nodes <-as_tibble(MC2_salmon_wgl$nodes) %>%
select(id,dataset,shpcountry,rcvcountry)
MC2_salmon_wgl_edges <-as_tibble(MC2_salmon_wgl$links) %>%
relocate(7,8,6,5,1,2,9,3,4)
#10_fish type: salmon
MC2_salmon_nodes <-as_tibble(MC2_salmon$nodes) %>%
select(id,dataset,shpcountry,rcvcountry)
MC2_salmon_edges <-as_tibble(MC2_salmon$links) %>%
relocate(8,9,7,6)
#11_fish type: shark
MC2_shark_nodes <-as_tibble(MC2_shark$nodes) %>%
select(id,dataset,shpcountry,rcvcountry)
MC2_shark_edges <-as_tibble(MC2_shark$links) %>%
relocate(7,8,6,5,1,2,9,3,4)
#12_fish type: tuna
MC2_tuna_nodes <-as_tibble(MC2_tuna$nodes) %>%
select(id,dataset,shpcountry,rcvcountry)
MC2_tuna_edges <-as_tibble(MC2_tuna$links) %>%
relocate(5,6,4,3)3.4 Concatenate Data frame from Bundles
After it has been converted to tibble data frame, the bundles are concatenated. Among the 12 files, there are 3 data frames which have 7 variables whereas the rest have 9 variables. As such, bind_rows() - a function from the dplyr package within the tidyverse is used.
Moreover, the knitr: kable() function is used to display the results of the combined_edges.
Show the code
# concatenante the edges
combined_edges <- bind_rows(MC2_carp_edges,MC2_chub_mackerel_edges,
MC2_cod2_edges,MC2_herring_edges,MC2_lichen_edges,
MC2_pollock_edges,MC2_salmon_wgl_edges,
MC2_salmon_edges,MC2_shark_edges,
MC2_catfish_edges,MC2_mackerel_edges,MC2_tuna_edges)
# concatenate the nodes
combined_nodes <- bind_rows(MC2_carp_nodes,MC2_catfish_nodes,
MC2_chub_mackerel_nodes,MC2_cod2_nodes,
MC2_herring_nodes, MC2_lichen_nodes,
MC2_mackerel_nodes, MC2_pollock_nodes,
MC2_salmon_wgl_nodes,MC2_salmon_nodes,
MC2_shark_nodes, MC2_tuna_nodes)
#output for dataframe using knitr:: kable
kable(head(combined_edges), "simple")| source | target | dataset | generated_by | arrivaldate | hscode | valueofgoods_omu | volumeteu | weightkg |
|---|---|---|---|---|---|---|---|---|
| Tshimbua GmbH & Co. KG | Caracola del Sol Services | MC2 | carp | 2034-03-20 | 80440 | 15915 | 0 | 15720 |
| Marine Masterminds Dry dock | Playa de la Luna Incorporated | MC2 | carp | 2034-08-01 | 940179 | NA | 0 | 10795 |
| Marine Masterminds Dry dock | Saltwater Supreme ОАО Forwading | MC2 | carp | 2034-11-27 | 940161 | NA | 0 | 6555 |
| Marine Masterminds Dry dock | Saltwater Supreme ОАО Forwading | MC2 | carp | 2034-10-10 | 940161 | NA | 0 | 6675 |
| zhāng yú Ges.m.b.H. Solutions | Portuguese Tuna Incorporated Marine | MC2 | carp | 2034-04-09 | 40729 | NA | 15 | 54775 |
| Nile S.A. de C.V. | Caracola del Sol Services | MC2 | carp | 2034-03-30 | 700711 | NA | 0 | 22430 |
3.5 Creating a Master ID data frame
Moving on, we would like to create a master ID data frame from the knowledge graph. It is unclear whether there are any missing source or target that are not reflected in the MC_challenge_nodes.
The code chunk below will identify if there are any missing id. If it exists in either columns, it will be appended back to the MC2_challenge_nodes data frame. A new column called trade_status is created to help identify the trade status of the company. !is.na() functions is used to check for values. If it is N/A in both columns, the trade status will be updated as Unknown.
Likewise, we rename the id to label and create an id column through mutate and nrow which becomes an unique identifier to the label.
Show the code
#create new df and add new column called trade_status
MC2_id_list_vis <- MC2_challenge_nodes %>%
rename(label = id) %>%
mutate(id = as.character(1:nrow(MC2_challenge_nodes)),
trade_status = case_when(
!is.na(shpcountry) & !is.na(rcvcountry) ~ "Import & Export",
!is.na(shpcountry) ~ "Export",
!is.na(rcvcountry) ~ "Import",
is.na(shpcountry) | is.na(rcvcountry) ~"Unknown"
)) %>%
filter(!is.na(trade_status))
#reorder the columns
MC2_id_list_vis <- MC2_id_list_vis %>%
select(id,label, trade_status,shpcountry, rcvcountry)
#create similar list
MC2_id_list <- MC2_challenge_nodes %>%
rename(label = id) %>%
mutate(
trade_status = case_when(
!is.na(shpcountry) & !is.na(rcvcountry) ~ "Import & Export",
!is.na(shpcountry) ~ "Export",
!is.na(rcvcountry) ~ "Import",
is.na(shpcountry) | is.na(rcvcountry) ~"Unknown"
))
#reorder the columns
MC2_id_list <- MC2_id_list %>%
select(label,trade_status,shpcountry, rcvcountry)
#create df to identify companies in target column that are not in the nodes
ID_target <- MC2_challenge_edges %>%
filter(!(target %in% MC2_id_list$label)) %>%
distinct(target) %>%
#rename to match names in nodes df
rename(id = target) %>%
#dummy columns are created to bind rows together
mutate(dataset = NA, shpcountry = NA, rcvcountry = NA)
#create df to identify companies in source column that are not in the nodes
ID_source <- MC2_challenge_edges %>%
filter(!(source %in% MC2_id_list$label)) %>%
distinct(source) %>%
rename(id = source) %>%
mutate(dataset = NA, shpcountry = NA, rcvcountry = NA)
#append the distinct companies into the nodes
MC2_challenge_nodes <- MC2_challenge_nodes %>%
rbind(ID_target) %>%
rbind(ID_source)As observed, all the companies are well stored in the MC2_challenge_nodes data frame.
#output of ID nodes that are not in Master list
nrow(ID_source)[1] 0
nrow(ID_target)[1] 0
3.5 Data Wrangling
After concatenating and creating a master ID data frame, adjustments are made to rectify the following:
- arrivaldate is not in
dateformat. [Revised throughlubridatein theymdformat] - hscode is not in
chrformat. [Revised fromint] - date is not comprehensive. [Create new column called year]
- edge data frame does not have corresponding id to source and target. [Rename source and target to sourcelabel and targetlabel respectively. Thereafter, Left_join with master ID
MC2_id_list_visdata frame to get the corresponding ID]
Show the code
#revise the data format for arrivaldate and hscode
MC2_challenge_edges<- MC2_challenge_edges %>%
rename(sourcelabel = source, targetlabel = target) %>%
mutate(arrivaldate =ymd(arrivaldate),
hscode = as.character(hscode),
year = year(arrivaldate))
#to append correspoinding id through left_join
MC2_challenge_edges <- MC2_challenge_edges %>%
left_join(MC2_id_list_vis, by = c("sourcelabel" = "label")) %>%
rename(source = id) %>%
left_join(MC2_id_list_vis, by = c("targetlabel" = "label")) %>%
rename(target = id) %>%
relocate(10,14)
#revised the same approach to the bundles
combined_edges_cleaned<- combined_edges %>%
mutate(arrivaldate =ymd(arrivaldate),
hscode = as.character(hscode))
#output for dataframe using knitr:: kable
kable(head(combined_edges_cleaned), "simple")| source | target | dataset | generated_by | arrivaldate | hscode | valueofgoods_omu | volumeteu | weightkg |
|---|---|---|---|---|---|---|---|---|
| Tshimbua GmbH & Co. KG | Caracola del Sol Services | MC2 | carp | 2034-03-20 | 80440 | 15915 | 0 | 15720 |
| Marine Masterminds Dry dock | Playa de la Luna Incorporated | MC2 | carp | 2034-08-01 | 940179 | NA | 0 | 10795 |
| Marine Masterminds Dry dock | Saltwater Supreme ОАО Forwading | MC2 | carp | 2034-11-27 | 940161 | NA | 0 | 6555 |
| Marine Masterminds Dry dock | Saltwater Supreme ОАО Forwading | MC2 | carp | 2034-10-10 | 940161 | NA | 0 | 6675 |
| zhāng yú Ges.m.b.H. Solutions | Portuguese Tuna Incorporated Marine | MC2 | carp | 2034-04-09 | 40729 | NA | 15 | 54775 |
| Nile S.A. de C.V. | Caracola del Sol Services | MC2 | carp | 2034-03-30 | 700711 | NA | 0 | 22430 |
- Duplicates found [Retain for further analyses as the purchases might be broken down into small trade to avoid detection ]
Show the code
#check for duplicates
dup <- (nrow(MC2_challenge_edges) - nrow(unique(MC2_challenge_edges)))
#reformat output
dup_reformat <- format(dup, big.mark=",")
#print output
dup_reformat[1] "155,291"
4. Distribution of transactions
In this section, we will create plots with interactivity to study and explore the data from the knowledge graph. The plots are created with the use of heatmaply, visNetwork, and igraphpackages.
4.1 Number of Transactions by Year and Month
Heatmap is created to provide a graphical representation to the transactions. It uses a system of color-coding to represent different values. [RColorBrewer] package is used to include sequential palettes “Blues” showing progress from low to high (gradient).
As fishing might occurs on a seasonality basis, we created an additional column called month before grouping it by year and month. To create an interactive heatmap, data frame are transpose through pivot_wider before converting it to matrix with the as.matrix function. Thereafter, [heatmaply] package is used.
Show the code
#aggregate to determine transactions count by year and month
transaction_counts_by_year <- MC2_challenge_edges %>%
mutate(month = round(month(arrivaldate))) %>%
group_by(year, month) %>%
summarise(count = n())
#transpose df by using pivot_wider
pivoted_data <- transaction_counts_by_year %>%
pivot_wider(names_from = year, values_from = count) %>%
#remove the month column
select(2:8)
#convert pivoted_data into a matrix
heatmap_data <- as.matrix(pivoted_data)
#create interactive heatmap without dendrogram
heatmaply(heatmap_data, dendrogram = "none",
xlab = "Year", ylab = "Month",
main = "Number of Transactions by Year and Month",
scale = "none",
grid_color = "white",
grid_width = 0.00001,
titleX = FALSE,
hide_colorbar = FALSE,
label_names = c("Month:", "Year: ", "No. of Transactions:"),
fontsize_row = 10, fontsize_col = 10,
colors = "Blues",
labCol = colnames(heatmap_data),
labRow = rownames(heatmap_data),
plot_method = "plotly")Observations:
On a yearly basis, the volume of trade is decreasing at a decreasing rate. It peaks around 2028-2030 with the highest time period, occurring in 2030.
March seems to be a period with low transactions. However, it reaches a record in Mar 2030, with the highest volume of transactions in the seven years time period.
4.2 Trade flow of Company with above 30 transactions in Mar 2023
Noting that Mar 2030 has a record of high volume, we would like to examine the trade flow of the Company in this time frame. We start off by creating a new edges and nodes data frame. The edges are created by aggregating it from the the master edges file MC2_challenge_edges.
We will be using the [dpylr] package to create MC2_2030_Mar_edges:
mutate(): to add additional column called monthfilter(): is used to filter for year = 2030 and month = 3group_byis used to aggregate it by hscode and yearsummarize()is used to compute the weight and median_weight of goodsfilter()is used to remove matching name in source and target and to filter for transactions with above 30 counts. This brings up to ~200 rows.rename()is used to change the title of the column
Thereafter, we create MC2_2030_Mar_nodes by filtering out the distinct ID that are in the source and label column from the master ID data frame, MC2_id_list_vis .
Show the code
#create edges df for transactions from Mar 2030
MC2_2030_Mar_edges <- MC2_challenge_edges %>%
mutate(month = round(month(arrivaldate))) %>%
filter(year == "2030" & month =="3") %>%
group_by(source, target, hscode) %>%
summarize(weight = n()) %>%
filter(source !=target) %>%
filter(weight >29) %>% #keep ~200 rows
rename(from = source, to = target) %>%
select(1,2,4,3) %>% #relocate weight to 3rd column
ungroup()
#create nodes df for transactions from Mar 2030
MC2_2030_Mar_nodes <- MC2_id_list_vis %>%
filter(id %in% MC2_2030_Mar_edges$from | id %in% MC2_2030_Mar_edges$to) %>%
distinct() In the code chunk below, we compute the centrality between the nodes, by using the graph_from_data_frame function from the [igraph] package. As seen from the output below, the top row represents the ID of the companies while the bottom row represents their score. Given that the result of the betweenness centrality and closeness centrality are not significant, we will would look into the degree centrality.
#create igraph object
g <- graph_from_data_frame(d=MC2_2030_Mar_edges,
vertices=MC2_2030_Mar_nodes, directed=TRUE)
#compute betweeness centrality
betweenness_centrality <- betweenness(g)
MC2_2030_Mar_nodes$betweenness_centrality <-
betweenness_centrality[as.character(MC2_2030_Mar_nodes$id)]
#output for top results
head(sort(betweenness_centrality, decreasing=TRUE)) 8 3 5 6 9 11
6 0 0 0 0 0
After the data preparation, visNetwork is used to plot the plot the interactive network graph with the Fruchterman and Reingold layout. Additionally, the graph uses three colors tone from the diverging palettes scheme to color-code the nodes based on degree centrality. It ranges from red - yellow - blue.
Show the code
#compute degree centrality
degree_centrality <- degree(g)
MC2_2030_Mar_nodes$degree_centrality <-
degree_centrality[as.character(MC2_2030_Mar_nodes$id)]
#compute closeness centrality
closeness_centrality <- closeness(g, normalized=TRUE)
MC2_2030_Mar_nodes$closeness_centrality <-
closeness_centrality[as.character(MC2_2030_Mar_nodes$id)]
#add diverging palettes scheme to nodes based on degree centrality
colors <- colorRampPalette(brewer.pal(3, "RdYlBu"))(3) # use three colors
MC2_2030_Mar_nodes <- MC2_2030_Mar_nodes %>%
mutate(shape="dot", shadow=TRUE,
title=trade_status, # hover for trade_status
label=label, # add labels on nodes
size=20, # set size of nodes
borderWidth=1, #set border width of nodes
color.background=colors[degree_centrality], #set color
color.border="grey") #set border color
#plot network graph
visNetwork(MC2_2030_Mar_nodes, MC2_2030_Mar_edges,
main ="Trade flow of Company
<br>with above 30 transactions in Mar 2023<br>") %>%
visIgraphLayout(layout = "layout_with_fr") %>%
visOptions(highlightNearest = TRUE,
nodesIdSelection = TRUE,
selectedBy ="degree_centrality") %>%
visEdges(arrows = "to") %>% #indicate direction
visLayout(randomSeed = 123) %>%
addFontAwesome(name = "font-awesome") %>% #add icon to network
visInteraction(dragNodes = FALSE, dragView = TRUE,
zoomView = TRUE, navigationButtons = TRUE) #freeze networkvisIgraphLayout is used to compute coordinates. In the example above, the Fruchterman and Reingold layout is used.
visOptions is an options for network visualization. We highlighted the nearest when clicking a node, a dropdown list for ID, a dropdown list by degree of centrality.
visEdges is edges options. We includes arrow to indicate direction.
visLayout is an layout options. randomSeed is included for the layout to remain the same every time.
addFontAwesome is used to add icons to the network.
visInteraction is used for network visualization interaction.
The default setting as as follows:
dragNodes: IF TRUE, nodes can be dragged around by user.dragView: If TRUE, view can be dragged around by user.zoomView: If TRUE, user can zoom in.navigationButtons: If FALSE, navigation buttons are not on the network graph
Observations:
Blue nodes refer to a high degree of centrality.
hǎi dǎn Corporation Wharfhas the highest degree of 19, followed bySaltwater Supreme OAO forwardingwith a score of 12.Majority of the Company have low degree of centrality and work exclusively with another counterpart. For companies in cluster, they generally have high degree of centrality and deals with companies with low degree of centrality (Orange nodes).
5. Distribution of Median Weight
By hscode and year
A treemap displays hierarchical data as a set of nested rectangles, where each group is represented by an area that is proportional to its value. We would like to examine the distribution by hscode and year of the median weight and transaction counts(weight). Based on median weight, we will be able to identify companies who over fish.
As per section 3.1, the [devtools] package have been loaded. Thereafter, we install the package found in github (#install_github) and load the [d3treeR] package.
(Note: we are only required to install the github package once)
#install_github("timelyportfolio/d3treeR")
library(d3treeR)We will aggregate from MC2_challenge_edges data frame by using the [dpylr] package:
group_by()is used to aggregate it by hscode and yearsummarize()is used to compute the weight and median_weight of goodsarrange(desc())to sort weight in descending orderMC2_hscode_weight[1:100,]to retrieve the first 100 rows
(We are only interested in the highly utilized hscode)
We build our treemap by inputting our aggregated data frame (MC2_hscode_weight) into the treemap() function and saved it as an object called it. Thereafter, the d3tree function from the [d3treeR] package is used to build our interactive treemap.
Show the code
#aggregate into new df by hscode and year before sorting in descending order
MC2_hscode_weight <- MC2_challenge_edges %>%
group_by(hscode,year) %>%
summarize(weight = n(),
median_weight = median(weightkg)) %>%
arrange(desc(weight))
#retrieve the first 100 rows
MC2_hscode_weight <- MC2_hscode_weight[1:100,]
#treemap saved under 'it' object
it <- treemap(MC2_hscode_weight,
index=c("hscode","year"),
vSize="weight",
vColor="median_weight",
type ="value",
algorithm = "pivotSize",
sortID = "weight",
palette="Blues",
border.lwds = "white"
)d3tree(it, rootname = "Distribution of hscode")index vector shows the hscode code followed by the year.
vSize vector shows the distribution by weight(the size of the rectangles)
vColor vector shows the different intensity by median_weight
type vector reflects the value type treemap
algorithm vector to pivot by size
sortID vector to determined the order from top left to bottom right
palette vector paints the treemap using [rColorBrewer] package
border.lwds vector paints the border width to white
Observations:
As observed from top left to bottom right,
hscode '306170'has the highest transactions count. Likewise, the median weight is more left-skewed.As observed from the treemap,
hscode '870323has the darkest gradient in 2032. With reference to section 4.1, the transaction counts did not increased. Thus, it would be ideal to observe the transactions of hscode 870323 in year 2032.HS code is a hierarchical model. As iterated, it composes of six digits and could be broken down into chapter/heading/subheading with two digits each. As observed, hscode that starts with
30occupies a higher weightage as well. From the graph above, hscode306170and304620are rank in the first and sixth position.
5.1 Trade flow for Hscode 306170
In this section, we will be looking into the trade flow of hscode 306170 with the use of Eigenvector centrality. It measures the importance or influence of a node based on its connections to other highly central nodes.
After using similar approach as section 4.2, we retrieve the top 300 rows and create a color scale RdYlBu based on the range of Eigenvector. The blue nodes below represents key players or market leader in the network. Moreover, a high Eigenvector centrality indicates an important/influential connection to other nodes.
Show the code
#create edges df for transactions for hscode 306170
MC2_306170_edges <- MC2_challenge_edges %>%
filter(hscode =="306170") %>%
group_by(source, target) %>%
summarize(weight = n()) %>%
filter(source !=target) %>%
filter(weight >103) %>% #keep ~300 rows
rename(from = source, to = target) %>%
ungroup()
#create nodes df for transactions from Mar 2030
MC2_306170_nodes <- MC2_id_list_vis %>%
filter(id %in% MC2_306170_edges$from | id %in% MC2_306170_edges$to) %>%
distinct()
#create igraph object
g_hscode <- graph_from_data_frame(d=MC2_306170_edges,
vertices=MC2_306170_nodes, directed=TRUE)
#compute eigen centrality
ev_centrality <- eigen_centrality(g_hscode)$vector
MC2_306170_nodes$eigen_centrality <- ev_centrality
# define color palette
num_color_groups <- 3
color_palette <- colorRampPalette(brewer.pal
(num_color_groups, "RdYlBu"))(num_color_groups)
# create a color scale based on the eigenvector values range
min_centrality <- min(MC2_306170_nodes$eigen_centrality, na.rm = TRUE)
max_centrality <- max(MC2_306170_nodes$eigen_centrality, na.rm = TRUE)
color_scale <- scales::rescale(MC2_306170_nodes$eigen_centrality, to = c(0, 1))
# assign colors to the nodes based on eigenvector centrality
MC2_306170_nodes$color <- color_palette[cut(color_scale, breaks = num_color_groups)]
#add diverging palettes scheme to nodes based on degree centrality
MC2_306170_nodes <- MC2_306170_nodes %>%
mutate(shape="dot", shadow=TRUE,
title=trade_status, # hover for trade_status
label=label, # add labels on nodes
size=20, # set size of nodes
borderWidth=1, #set border width of nodes
color.background=colors[eigen_centrality], #set color for nodes
color.border="grey") #set color for border
#plot network graph
visNetwork(MC2_306170_nodes, MC2_306170_edges,
main ="Trade flow of top 300 transactions
<br> for hscode 361070<br>") %>%
visIgraphLayout(layout = "layout_with_fr") %>%
visOptions(highlightNearest = TRUE,
nodesIdSelection = TRUE) %>%
visEdges(arrows = "to") %>% #indicate direction
visLayout(randomSeed = 123) %>%
addFontAwesome(name = "font-awesome") %>% #add icon to network
visInteraction(dragNodes = FALSE, dragView = TRUE,
zoomView = TRUE, navigationButtons = TRUE) #freeze networkObservations:
- There are three key opinion leaders and three sub-leaders in the network where
Mar del Este CJSCcould be inferred as the market leader. - Interestingly, the company with high degree of centrality in the previous section -
hǎi dǎn Corporation Wharfdo not have a high eigenvector centrality. Although they have a large number of connections who could potential reach and interact with other connection, their eigenvector remains low. We deduced that the nodes that they are connected to are not well connected.
5.2 Trade flow for Hscode 873023 in year 2032
In this section, we will be looking into the trade flow of hscode 873023 in year 2032 since there is a spike increase in median_weight. It will be measured with Degree Centrality as it helps to identify node who has the greatest interaction in the time period. Moreover, we color the node based on values. Blue nodes represents high degree while Orange nodes represents low degree.
Show the code
#create edges df for transactions from Mar 2030
MC2_2032_870323_edges <- MC2_challenge_edges %>%
filter(year == "2032" & hscode =="870323") %>%
group_by(source, target, hscode) %>%
summarize(weight = n()) %>%
filter(source !=target) %>%
filter(weight >1) %>% #keep ~200 rows
rename(from = source, to = target) %>%
select(1,2,4,3) %>% #relocate weight to 3rd column
ungroup()
#create nodes df for transactions from Mar 2030
MC2_2032_870323_nodes <- MC2_id_list_vis %>%
filter(id %in% MC2_2032_870323_edges$from | id %in% MC2_2032_870323_edges$to) %>%
distinct()
#create igraph object
gg <- graph_from_data_frame(d=MC2_2032_870323_edges,
vertices=MC2_2032_870323_nodes, directed=TRUE)
#compute degree centrality
degree_centrality <- degree(gg)
MC2_2032_870323_nodes$degree_centrality <-
degree_centrality[as.character(MC2_2032_870323_nodes$id)]
#add diverging palettes scheme to nodes based on degree centrality
colors <- colorRampPalette(brewer.pal(3, "RdYlBu"))(3) # use three colors
MC2_2032_870323_nodes <- MC2_2032_870323_nodes %>%
mutate(shape="dot", shadow=TRUE,
title=trade_status, # hover for trade_status
label=label, # add labels on nodes
size=20, # set size of nodes
borderWidth=1, #set border width of nodes
color.background=colors[degree_centrality], #set color for nodes
color.border="grey") #set color for border
#plot network graph
visNetwork(MC2_2032_870323_nodes, MC2_2032_870323_edges, main ="Trade flow in 2032
<br>for hscode 870323<br>") %>%
visIgraphLayout(layout = "layout_with_fr") %>%
visOptions(highlightNearest = TRUE,
nodesIdSelection = TRUE,
selectedBy = "degree_centrality") %>%
visEdges(arrows = "to") %>% #indicate direction
visLayout(randomSeed = 123) %>%
addFontAwesome(name = "font-awesome") %>% #add icon to network
visInteraction(dragNodes = FALSE, dragView = TRUE,
zoomView = TRUE, navigationButtons = TRUE) #freeze networkObservations:
- In comparison to hscode 361070, there are more key opinion leaders and sub-leaders in this network. They are heavily connected to one another, with more interaction between companies with large connections.
- At the bottom right of the network graph, there are two distinct
hub and spokenetwork which belongs toEstrella del Mar SeafarerandChhattisgarh Marine ecology A/S Delivery. There is a central node who is directly connected to all other nodes (spokes) with a high degree of centrality. It receives from one company but disseminates goods to seven/eight companies.
5.3 Examining Chapter 30 Hscode
Among the 17 hscode identified in the treemap above, we noticed that there are two codes which belongs to Chapter 30. In terms of volume, 306170 and 304620 are ranked first and sixth respectively. Therefore, we would like to match with the set of predicted knowledge graph links from FishEye to determine if we would get any insight.
Interestingly, this chapter belongs to the category of Fish.
Show the code
# extract hscode that starts with 30
start_code <- str_sub(combined_edges_cleaned$hscode, start = 1, end = 2)
# filter for hs code that starts with 30
combined_edges_cleaned_30 <- combined_edges_cleaned %>%
filter(start_code == "30")
# create df for total count of hscode
combined_edges_cleaned_hscount <- combined_edges_cleaned_30 %>%
group_by(generated_by) %>%
summarise(hscount = n_distinct(hscode)) %>%
arrange(desc(hscount))
#create df for total weightage by each fish type and hs code
combined_edges_cleaned_30 <- combined_edges_cleaned_30 %>%
group_by(generated_by, hscode) %>%
summarise(weight = n()) %>%
arrange(desc(weight))
# combined both df through left join
combined_edges_cleaned_30hscount <- left_join(combined_edges_cleaned_30,
combined_edges_cleaned_hscount,
by = "generated_by")
#treemap saved under '30hscount' object
hscount30 <- treemap(combined_edges_cleaned_30hscount,
index=c("generated_by","hscode"),
vSize="weight",
vColor="hscount",
type ="value",
algorithm = "pivotSize",
sortID = "weight",
palette="Blues",
border.lwds = "white"
)d3tree(hscount30, rootname = "Distribution based on generated program")Observations:
- From the predicted knowledge graph links, Cod2 program has the highest amount of transaction counts followed by salmon.
- There are multiple hscode tagged to each fish types. However, this is an exception to Lichen, which only have 1 hscode - 300510. Salmon and cod2 have a lower count, while mackerel and catfish have 15 different hs code as identified in the differences in gradient.
- Upon closer inspection, It has been observed that there are various hscode that are not comprehensive. The sub-header is missing. e.g. hscode in salmon and cod2 contain five digits whereas lichen has six digits hscode.
5.3.1 Structure of Hscode
Noting that the Hscode in the bundles are not comprehensive, we would like to examine the list of companies who have incomplete hscode. We retrieved the top 20 with more than 10 transactions and uses an ifelse function to include Unknown to rcvcountry with N/A. Thereafter, rename() is used to change rcvcountry to group for color coding.
Show the code
#filter for companies with 5 digits hscode
combined_edges_cleaned_30hscount_new <- combined_edges_cleaned_30hscount %>%
filter(nchar(hscode) == 5) %>%
ungroup()
#filter the main edges file with the filtered 5 digits hscode
combined_edges_cleaned_filtered <- combined_edges_cleaned %>%
filter(hscode %in% combined_edges_cleaned_30hscount_new$hscode) %>%
ungroup()
# combined both df through left join
combined_edges_cleaned_newedges <- left_join(combined_edges_cleaned_filtered,
combined_edges_cleaned_30hscount_new,by = "hscode")
# aggregate and retrieve top 20 rows with >9 transactions
combined_edges_cleaned_newedgestt <- combined_edges_cleaned_newedges %>%
group_by(source,target,hscode) %>%
summarize(weight =n()) %>%
filter(source!=target) %>%
filter(weight >9) %>%
arrange(desc(weight)) %>%
ungroup()
# filter out the nodes in newedges
combined_nodes_filtered <- combined_nodes %>%
rename(label =id) %>%
filter(label %in% combined_edges_cleaned_newedgestt$source |
label %in% combined_edges_cleaned_newedgestt$target ) %>%
select(1) %>%
distinct() %>%
ungroup()
#left join to get id of the nodes
combined_nodes_filtered_addid <- combined_nodes_filtered %>%
filter(label %in% MC2_id_list_vis$label) %>%
left_join(MC2_id_list_vis,by="label") %>%
relocate(id,label) %>%
rename(group = rcvcountry) %>%
ungroup()
#add unknown to N/A in rcvcountry
combined_nodes_filtered_addid$group<- ifelse(is.na(combined_nodes_filtered_addid$group), "Unknown", combined_nodes_filtered_addid$group)
#append by matching exporters to the importers (list of unknown)
combined_edges_cleaned_newedges_test <- combined_edges_cleaned_newedges %>%
left_join(combined_nodes_filtered_addid, by = c("source" = "label")) %>%
rename(from = id) %>%
left_join(combined_nodes_filtered_addid, by = c("target" = "label")) %>%
rename(to = id) %>%
filter (!is.na(to)) %>%
group_by(from,to) %>%
summarise(weight = n()) %>%
filter (weight >9) %>%
ungroup()
#output for dataframe using knitr:: kable
kable(combined_nodes_filtered, "simple")| abel |
|---|
| Caracola del Sol Services |
| Mar del Este CJSC |
| Pao gan LC Freight |
| hǎi dǎn Corporation Wharf |
| Tsha wamba S.A. de C.V. |
| Saltwater Solitude N.V. International |
| Black Sea Tuna Sagl |
| Pao gan SE Seal |
| Sea Breeze Corporation Marine sanctuary |
| Selous Game Reserve S.A. de C.V. |
| -2 |
| Aqua Azul LC International |
| Arunachal Pradesh s Brine |
| Belgian Cod BV Solutions |
| Wave Watchers Ltd. Liability Co |
| Manipur Market Corporation Cargo |
| Náutica del Sol Corporation |
| Matthew Oyj Marine sanctuary |
| 1 Ltd. Liability Co Cargo |
| 2 Limited Liability Company |
| Arunachal Pradesh s Plc |
| Kariba Dam Ges.m.b.H. Delivery |
| Oceanic Opportunities CJSC Marine |
| Norwegian Haddock GmbH & Co. KG |
| -53 |
| Saltwater Surf Club Ltd. Liability Co Transport |
| Belgian Scallop Harbor ОАО Freight |
| David Limited Liability Company Worldwide |
| Logistics Ltd. Liability Co |
| Greek Octopus Ltd. Corporation |
| Ancla de Oro Kga Import |
| Samaka Chart ОАО Delivery |
| xiǎo xiā S.p.A. Deep-sea |
| Faroe Islands Halibut Sea S.p.A. Distribution |
| Irish Sea Salt spray |
| Joseph Limited Liability Company |
Observations:
- Among the 36 companies identified with inaccurate hscode, we found out that two identified companies -
Mar del Este CJSCandhǎi dǎn Corporation Wharfare included the list.
Henceforth, we will plot a network graph to look into the interaction of the company and their receiving country.
Show the code
visNetwork(combined_nodes_filtered_addid, combined_edges_cleaned_newedges_test, main ="Companies with Inaccurate hs code
<br>and >10 transactions <br>") %>%
visIgraphLayout(layout = "layout_with_fr") %>%
visOptions(highlightNearest = TRUE,
nodesIdSelection = TRUE,
selectedBy = "group") %>%
visLegend() %>%
visEdges(arrows = "to") %>% #indicate direction
visLayout(randomSeed = 123) %>%
visLegend() %>%
addFontAwesome(name = "font-awesome") %>% #add icon to network
visGroups(groupname = "Coralmarica", shape = "icon",
icon= list(code ="f21a", color = "#EBCC2A" )) %>%
visGroups(groupname = "Unknown", shape = "icon",
icon= list(code ="f21a", color = "#F21A00" )) %>%
visGroups(groupname = "Kuzalanda", shape = "icon",
icon= list(code ="f21a", color = "#606060" )) %>%
visGroups(groupname = "Merigrad", shape = "icon",
icon= list(code ="f21a", color = "#00887d" )) %>%
visGroups(groupname = "Oceanus", shape = "icon",
icon= list(code ="f21a", color = "#3B9AB2")) %>%
visInteraction(dragNodes = FALSE, dragView = TRUE,
zoomView = TRUE, navigationButtons = TRUE) #freeze networkObservations:
Majority of the receiving companies are associated with Oceanus with a minority receiving from companies with no association.
There are no distinct cluster. Meanwhile, the two companies that we identified are importers who receives goods from two sources. However, there are no significant findings from the companies with incomplete Hscode and we would like to further determine if there are any incomplete Hscode in the main edges data frame.
5.3.2 Incomplete Hscode in Edges
From the code chunk below, we did a quick check on the hscode in MC2_challenge_edges. We found out that all our transactions in the knowledge graph contains six digits hscode. As such, we will eliminate bundles with inaccurate hscode.
#filter for companies with 5 digits hscode
MC2_challenge_edges_check <- MC2_challenge_edges %>%
filter(nchar(hscode) == 5) %>%
ungroup()
any(MC2_challenge_edges_check)[1] FALSE
5.3.3 Accuracy of Hscode
Noting that chapter 30 hscode does not include Shark, we will examine the accuracy of hscode by looking at the completion of the hscode in the following steps:
Filter()for companies with 5 digits hscode, 6 digits hscode separatelyAggregate the two data frame by the program (generated_by) using
group_by()and count the occurrence usingsummarize()Create new columns to add the hscode status
(Complete for hscode with 6 digits, Incomplete for hscode with 5 digits)
Combined both data frame through
rbindsince there is an exact match in the number of columnsPlot the interactive stacked bar chart with
plot_ly()in descending order
Show the code
#filter for companies with 6 digits hscode
combined_edges_cleaned6 <- combined_edges_cleaned %>%
filter(nchar(hscode) == 6) %>%
ungroup()
#filter for companies with 5 digits hscode
combined_edges_cleaned5 <- combined_edges_cleaned %>%
filter(nchar(hscode) == 5) %>%
ungroup()
#aggregate for program with 5digits hscode
count5 <- combined_edges_cleaned5 %>%
group_by(generated_by) %>%
summarize(count = n()) %>%
ungroup ()
#aggregrate for program with 6 digits hscode
count6 <- combined_edges_cleaned6 %>%
group_by(generated_by) %>%
summarize(count = n()) %>%
ungroup ()
#create new column to add hscode status
count5$type <- "Incomplete"
count6$type <- "Complete"
#combined both df with rbind
combined_df <- rbind(count5,count6)
# Create an interactive stacked bar plot
plot_ly(combined_df, x = ~generated_by,
y = ~count, color = ~type, type = "bar") |>
layout(xaxis = list(title = 'Generated Program',
categoryorder = "total descending"),
yaxis = list(title = 'Count'),
title = 'Completion of Hscode by structure',
barmode = 'stack') #create stacked barchartObservations:
- There is a high completion rate for most program tools with the exception of
Cod2with Incomplete data throughout. It is deem as unreliable as everything is incomplete. On the flip side, data retrieved fromLichenandSharkhave the highest accuracies. Thus, we concluded thatLichenandSharkare the reliable sets which will aid in the completion of the graph.
6. Companies with Unknown Trade Status
As seen in the Section 2.1 Metadata , the shpcountry and rcvcountry refers to the country that the company are most often associated with. Therefore, a newly founded company might have an unknown status. This might be applicable for those who want to avoid detection and start up a new company under a different name.
Henceforth, we would to examine the group with unknown trade status. The code has been revised previously in section 3.5 when we are creating the Trade Status Column. The code chunks below filters for unknown in trade_status in the receiver end and filter for Exporters before doing a left_join.
Show the code
#create df to filter list of unknown parties
unknownMC2_nodes <- MC2_id_list_vis %>%
filter(trade_status == "Unknown") %>%
select(1:2)
#create df to filter list of exporters who solely do export or both
MC2_exporters <- MC2_id_list_vis %>%
filter(trade_status == "Export" | trade_status == "Import & Export")
#append by matching exporters to the importers (list of unknown)
unknownMC2 <- MC2_challenge_edges %>%
left_join(MC2_exporters, by = c("sourcelabel" = "label")) %>%
rename(from = id) %>%
left_join(unknownMC2_nodes, by = c("targetlabel" = "label")) %>%
rename(to = id) %>%
filter (!is.na(to)) %>%
group_by(from,to) %>%
summarise(weight = n()) %>%
filter (weight >1) %>%
ungroup()
#filter and aggregate for the nodes
unknownMC2_revnodes <- MC2_id_list_vis %>%
filter(id %in% unknownMC2$from | id %in% unknownMC2$to) %>%
distinct() %>%
select(1:3) %>%
rename(group = trade_status)Similarly, visNetwork is used to plot the plot the interactive network graph with the Fruchterman and Reingold layout. We includes addFontAwesome() function to modify the color through visGroups. Custom navigation have been included with the feature to zoom and view, drag and view.
Show the code
#create network diagram for exporters
visNetwork(unknownMC2_revnodes, unknownMC2,
main = "Exporters who ship to <br> Unknown Origin<br>") %>%
visIgraphLayout(layout = "layout_with_fr") %>%
visOptions(highlightNearest = TRUE,
nodesIdSelection = TRUE) %>%
visLayout (randomSeed = 123) %>%
visEdges(arrows = "to") %>% #indicate direction
addFontAwesome(name = "font-awesome") %>% #add icon to network
#fill color and shape
visGroups(groupname = "Unknown", shape = "icon",
icon= list(code ="f21a", color = "#E1AF00" )) %>%
visGroups(groupname = "Import & Export", shape = "icon",
icon= list(code ="f21a", color = "#3B9AB2" )) %>%
visLegend() %>%
visInteraction(dragNodes = FALSE, dragView = TRUE,
zoomView = TRUE, navigationButtons = TRUE) #freeze networkBased on the network above, two companies are identified to ship to three or more receivers with unknown origin. Further monitoring might be required for the group of receivers/importers below until they have a trade status.
| Shipper | Receiver |
|---|---|
| hǎi dǎn Corporation Wharf | SouthSeafood Express Corp Maacama Ocean Worldwide LLC OranjestadCreek Express Sagl SavanetaCreek Solutions NV HomabayMarine Carriers N.V. |
| Sailors and Surfers Incorporated Enterprises | SavanetaCreek Solutions NV SumacAmerica Transport GmbH & Co. KG 8.Marine United AG |
7. Companies with Duplicated Entries
Although we have included the duplicated entries in our exercise, we would like to find out more about the companies with high duplicate counts. Thus, we filtered to the top 10 transactions by using top_n().
Show the code
#detect duplicate rows and aggregate it
MC2_dup_rows <- MC2_challenge_edges[duplicated(MC2_challenge_edges),] %>%
rename(from = source) %>%
rename(to = target) %>%
group_by(from,to) %>%
summarise(weight = n()) %>%
filter (weight >1) %>%
ungroup()
#filter top 10 transactions by weight
MC2_dup_top10 <- MC2_dup_rows %>%
top_n(10,weight)
#create df with corresponding nodes
MC2_dup_nodes <- MC2_id_list_vis %>%
filter(id %in% MC2_dup_top10$from | id %in% MC2_dup_top10$to) %>%
distinct() %>%
select(1:3) %>%
rename(group = trade_status)
#create network diagram for exporters
visNetwork(MC2_dup_nodes, MC2_dup_top10,
main = "Top 10 duplicated transactions") %>%
visIgraphLayout(layout = "layout_with_fr") %>%
visOptions(highlightNearest = TRUE,
nodesIdSelection = TRUE) %>%
visLayout(randomSeed = 123) %>%
visEdges(arrows = "to") %>% #indicate direction
addFontAwesome(name = "font-awesome") %>% #add icon to network
#fill color and shape
visLegend() %>%
visGroups(groupname = "Import", shape = "icon",
icon= list(code ="f21a", color = "#E1AF00" )) %>%
visGroups(groupname = "Import & Export", shape = "icon",
icon= list(code ="f21a", color = "#3B9AB2" )) %>%
visInteraction(dragNodes = FALSE, dragView = TRUE,
zoomView = TRUE, navigationButtons = TRUE) #freeze networkObservations:
- Companies with duplicated transactions tends to work with exclusivity . They do not have multiple interconnections and they operates on both ends of the trade, apart from
-2119who is an importer. In depth analysis might be needed to identify individual companies.
8. Conclusion:
Thus far, the detection of illegal fishing has been a challenging issues. There are many attributes that could lead to illegal, unreported, and unregulated (IUU) fishing. In our exercise, we observed the following:
- Abnormality in transaction counts in time period. Transactions decreases over the years and peak around 2028-2030. By seasonality, March seems to be a period with low transactions. From the network graph, we identified that there it is highly saturated by some companies.
- Hscode with highest transactions counts does not have the highest median weight. hs code
306170ranked 1st in terms of volume and it belongs to the Chapter 30 family along with304620. - Further analysis of Chapter 30 revealed that the bundles created are not accurate. Most reliable sets goes to
LichenandSharkwhich have 100% accuracy in terms of hscode. - Monitoring required for companies with unknown trade status as there are companies who supplies to a group of unknown companies. Likewise, further analysis might be required to examine the patterns of companies with high duplicate counts.
References:
Delignette-Muller, M. L., & Dutang, C. (2020). visNetwork: Network Visualization using ‘vis.js’ Library. Retrieved May 28, 2023, from https://cran.r-project.org/web/packages/visNetwork/vignettes/Introduction-to-visNetwork.html
Nagraj, R. (n.d.). Network Visualization in R with visNetwork. Retrieved May 28, 2023, from https://www.nagraj.net/notes/visnetwork/
R Graph Gallery. (n.d.). RColorBrewer Palettes. Retrieved May 28, 2023, from https://r-graph-gallery.com/38-rcolorbrewers-palettes.html
Tran, J. (2021). Network Visualization in R - Centrality Measurement. Retrieved May 28, 2023, from https://jtr13.github.io/cc21fall2/network-visualization-in-r.html#centrality-measurement
Tran, J. (2021). Network Visualization in R. Retrieved May 28, 2023, from https://jtr13.github.io/cc21fall2/network-visualization-in-r.html
VAST Challenge (2023). VAST Challenge 2023 - Mini-Challenge 2: Fishy Business. Retrieved May 22 , 2023, from https://vast-challenge.github.io/2023/MC2.html