Take Home Exercise 2

Mini Case 2 of Vast Challenge 2023

Author

Oh Jia Wen

Published

June 4, 2023

Modified

June 4, 2023

1. OVERVIEW

FishEye International is collaborating with the country of Oceanus to identify companies who could potentially engaged in illegal, unreported, and unregulated (IUU) fishing. FishEye has transformed import/export data into a knowledge graph.

The country of Oceanus has sought FishEye International’s help in identifying companies possibly engaged in illegal, unreported, and unregulated (IUU) fishing. 12 groups of link suggestions with various fish types are used to reason on the knowledge graph.

1.1 The Task

In this take-home exercise, temporal patterns for individual entries and between entities are identified using the knowledge graph FishEye created from trade records. In addition, we evaluate the sets of predicted knowledge graph links (Bundles) to determine which sets are more reliable in completing the graph.

2. Datasets

The trade data is stored in mc2_challenge_graph.json file with a time period of 7 years from the time period 2028 to 2024. There are a total of 34,576 nodes and 5,464,378 edges in the knowledge graphs. It includes a bundles with 12 types of marine species, ranging from Carp, Catfish, Chub_mackerel, Cod2, Herring, Lichen, Mackerel, Pollock, Salmon_wgl, Salmon, Shark, and Tuna.

2.1 Metadata

Location	Variables Name	Description
Node	id	Name of the company that originated (or received) the shipment
Node	shpcountry	Country the company most often associated with when shipping
Node	rcvcountry	Country the company most often associated with when receiving
Node, Edge	dataset	Always ’MC2
Edge	arrivaldate	Date the shipment arrived at port in YYYY-MM-DD format.
Edge	hscode	Harmonized System code for the shipment.
Edge	valueofgoods_omu	Customs-declared value of the total shipment, in Oceanus Monetary Units (OMU)
Edge	volumeteu	The volume of the shipment in ‘Twenty-foot equivalent units’
Edge	weightkg	The weight of the shipment in kilograms
Edge	type	Always ‘shipment’ for MC2
Edge	generated_by	Name of the program that generated the edge (only in bundles)

Note

HS code, also known as The Harmonized System are alphanumeric codes used for classifying goods for international trade and customs purposes. It composed of six digits and could be broken down into chapter/heading/subheading (two digits each).

3. Data Preparation

3.1 Install R-packages

Using p_load() of pacman package to load and install the following libraries:

jsonlite : To import data from JSON File into R
lubridate : To convert Date and Time
visNetwork: For Network Visualization
tidyverse: A collection of R packages use in everyday data analyses. It is able to support data science, data wrangling, and analysis.
hrbrthemes: For Additional Themes, and Utilities for ‘ggplot2’ (might not use)
heatmaply: For creating Interactive Cluster Heatmaps
treemap: For viisualizing hierarchical data using nested rectangles
devtools: For the installing for d3treeR
igraph: For exploring the network
RColorBrewer: For visualization. Contains ready-to-use color palettes
knitr: For dynamic report generation
stringr: For character manipulation.

pacman :: p_load (jsonlite, lubridate, tidygraph, ggraph, visNetwork, tidyverse,
                  igraph,heatmaply,hrbrthemes,treemap,devtools,
                  ggstatsplot,RColorBrewer, knitr,stringr)

options(scipen = 999)

3.2 Importing Data

The JSON files will be imported into R with the use of fromJSON function from jsonlite. The code chunk below shows the knowledge graph FishEye created from trade records.

MC2_challenge <- fromJSON("data/mc2_challenge_graph.json")

The bundles which consists of Carp, Catfish, Chub_Mackerel, Cod2, Herring, Lichen, Mackerel, Pollock, Salmon_wgl, Salmon, Shark and Tuna are imported.

Show the code

MC2_carp <- fromJSON("data/bundles/carp.json")
MC2_catfish <- fromJSON("data/bundles/catfish.json")
MC2_chub_mackerel <- fromJSON("data/bundles/chub_mackerel.json")
MC2_cod2 <- fromJSON("data/bundles/cod2.json")
MC2_herring <- fromJSON("data/bundles/herring.json")
MC2_lichen <- fromJSON("data/bundles/lichen.json")
MC2_mackerel <- fromJSON("data/bundles/mackerel.json")
MC2_pollock <- fromJSON("data/bundles/pollock.json")
MC2_salmon_wgl <- fromJSON("data/bundles/salmon_wgl.json")
MC2_salmon <- fromJSON("data/bundles/salmon.json")
MC2_shark <- fromJSON("data/bundles/shark.json")
MC2_tuna <- fromJSON("data/bundles/tuna.json")

3.3 Create Tibble Data frame

As the imported data is in JSON format, we will use the as_tibble to create a tibble from data.

MC2_challenge_nodes <-as_tibble(MC2_challenge$nodes) %>%
  select(id,shpcountry,rcvcountry)
MC2_challenge_edges <-as_tibble(MC2_challenge$links) %>%
  select(source,target,arrivaldate, hscode,valueofgoods_omu, 
         volumeteu, weightkg, valueofgoodsusd)

Additionally, the columns have been re-shuffled for the bundles. We used relocate to revise the order. The column will start from source, to target, etc. By doing so, every fish types have the same sequence.

Show the code

#1_fish type :carp
MC2_carp_nodes <-as_tibble(MC2_carp$nodes) %>%
  select(id,dataset,shpcountry,rcvcountry)
MC2_carp_edges <-as_tibble(MC2_carp$links) %>%
  relocate(8,9,7,6)

#2_fish type: catfish
MC2_catfish_nodes <-as_tibble(MC2_catfish$nodes) %>%
  select(id,dataset,shpcountry,rcvcountry)
MC2_catfish_edges <-as_tibble(MC2_catfish$links) %>%
  relocate(6,7,5,4)

#3_fish type: chub_mackerel
MC2_chub_mackerel_nodes <-as_tibble(MC2_chub_mackerel$nodes) %>%
  select(id,dataset,shpcountry,rcvcountry)
MC2_chub_mackerel_edges <-as_tibble(MC2_chub_mackerel$links) %>%
  relocate(8,9,7,6)

#4_fish type: cod2
MC2_cod2_nodes <-as_tibble(MC2_cod2$nodes) %>%
  select(id,dataset,shpcountry,rcvcountry)
MC2_cod2_edges <-as_tibble(MC2_cod2$links) %>%
  relocate(8,9,7,6)

#5_fish type: herring
MC2_herring_nodes <-as_tibble(MC2_herring$nodes) %>%
  select(id,dataset,shpcountry,rcvcountry)
MC2_herring_edges <-as_tibble(MC2_herring$links) %>%
  relocate(7,8,6,5,1,2,9,3,4)

#6_fish type: lichen
MC2_lichen_nodes <-as_tibble(MC2_lichen$nodes) %>%
  select(id,dataset,shpcountry,rcvcountry)
MC2_lichen_edges <-as_tibble(MC2_lichen$links) %>%
    relocate(7,8,6,5,1,2,9,3,4)

#7_fish type: mackerel
MC2_mackerel_nodes <-as_tibble(MC2_mackerel$nodes) %>%
  select(id,dataset,shpcountry,rcvcountry)
MC2_mackerel_edges <-as_tibble(MC2_mackerel$links) %>%
    relocate(6,7,5,4)

#8_fish type: pollock
MC2_pollock_nodes <-as_tibble(MC2_pollock$nodes) %>%
  select(id,dataset,shpcountry,rcvcountry)
MC2_pollock_edges <-as_tibble(MC2_pollock$links) %>%
    relocate(8,9,7,6)

#9_fish type: salmon_wgl
MC2_salmon_wgl_nodes <-as_tibble(MC2_salmon_wgl$nodes) %>%
  select(id,dataset,shpcountry,rcvcountry)
MC2_salmon_wgl_edges <-as_tibble(MC2_salmon_wgl$links) %>%
  relocate(7,8,6,5,1,2,9,3,4)

#10_fish type: salmon
MC2_salmon_nodes <-as_tibble(MC2_salmon$nodes) %>%
  select(id,dataset,shpcountry,rcvcountry)
MC2_salmon_edges <-as_tibble(MC2_salmon$links) %>%
  relocate(8,9,7,6)

#11_fish type: shark
MC2_shark_nodes <-as_tibble(MC2_shark$nodes) %>%
  select(id,dataset,shpcountry,rcvcountry)
MC2_shark_edges <-as_tibble(MC2_shark$links) %>%
    relocate(7,8,6,5,1,2,9,3,4)

#12_fish type: tuna
MC2_tuna_nodes <-as_tibble(MC2_tuna$nodes) %>%
  select(id,dataset,shpcountry,rcvcountry)
MC2_tuna_edges <-as_tibble(MC2_tuna$links) %>%
  relocate(5,6,4,3)

3.4 Concatenate Data frame from Bundles

After it has been converted to tibble data frame, the bundles are concatenated. Among the 12 files, there are 3 data frames which have 7 variables whereas the rest have 9 variables. As such, bind_rows() - a function from the dplyr package within the tidyverse is used.

Moreover, the knitr: kable() function is used to display the results of the combined_edges.

Show the code

# concatenante the edges 
combined_edges <- bind_rows(MC2_carp_edges,MC2_chub_mackerel_edges,
                        MC2_cod2_edges,MC2_herring_edges,MC2_lichen_edges,
                        MC2_pollock_edges,MC2_salmon_wgl_edges,
                        MC2_salmon_edges,MC2_shark_edges, 
                        MC2_catfish_edges,MC2_mackerel_edges,MC2_tuna_edges)

# concatenate the nodes 
combined_nodes <- bind_rows(MC2_carp_nodes,MC2_catfish_nodes,
                            MC2_chub_mackerel_nodes,MC2_cod2_nodes,
                            MC2_herring_nodes, MC2_lichen_nodes,
                            MC2_mackerel_nodes, MC2_pollock_nodes, 
                            MC2_salmon_wgl_nodes,MC2_salmon_nodes,
                            MC2_shark_nodes, MC2_tuna_nodes)

#output for dataframe using knitr:: kable
kable(head(combined_edges), "simple")

source	target	dataset	generated_by	arrivaldate	hscode	valueofgoods_omu	volumeteu	weightkg
Tshimbua GmbH & Co. KG	Caracola del Sol Services	MC2	carp	2034-03-20	80440	15915	0	15720
Marine Masterminds Dry dock	Playa de la Luna Incorporated	MC2	carp	2034-08-01	940179	NA	0	10795
Marine Masterminds Dry dock	Saltwater Supreme ОАО Forwading	MC2	carp	2034-11-27	940161	NA	0	6555
Marine Masterminds Dry dock	Saltwater Supreme ОАО Forwading	MC2	carp	2034-10-10	940161	NA	0	6675
zhāng yú Ges.m.b.H. Solutions	Portuguese Tuna Incorporated Marine	MC2	carp	2034-04-09	40729	NA	15	54775
Nile S.A. de C.V.	Caracola del Sol Services	MC2	carp	2034-03-30	700711	NA	0	22430

Tip

Catfish, Mackerel, and Tuna have 7 variables while other fishes have 9 variables. It is better to concatenate with the use of blind_rows instead of rbind() as the latter requires an exact match in the numbers of columns.

3.5 Creating a Master ID data frame

Moving on, we would like to create a master ID data frame from the knowledge graph. It is unclear whether there are any missing source or target that are not reflected in the MC_challenge_nodes.

The code chunk below will identify if there are any missing id. If it exists in either columns, it will be appended back to the MC2_challenge_nodes data frame. A new column called trade_status is created to help identify the trade status of the company. !is.na() functions is used to check for values. If it is N/A in both columns, the trade status will be updated as Unknown.

Likewise, we rename the id to label and create an id column through mutate and nrow which becomes an unique identifier to the label.

Show the code

#create new df and add new column called trade_status
MC2_id_list_vis <- MC2_challenge_nodes %>%
  rename(label = id) %>%
  mutate(id = as.character(1:nrow(MC2_challenge_nodes)),
    trade_status = case_when(
    !is.na(shpcountry) & !is.na(rcvcountry) ~ "Import & Export",
    !is.na(shpcountry) ~ "Export",
    !is.na(rcvcountry) ~ "Import",
    is.na(shpcountry) | is.na(rcvcountry) ~"Unknown"
  )) %>%
  filter(!is.na(trade_status))

#reorder the columns 
MC2_id_list_vis <- MC2_id_list_vis %>%
  select(id,label, trade_status,shpcountry, rcvcountry)

#create similar list 
MC2_id_list <- MC2_challenge_nodes %>%
  rename(label = id) %>%
  mutate(
    trade_status = case_when(
    !is.na(shpcountry) & !is.na(rcvcountry) ~ "Import & Export",
    !is.na(shpcountry) ~ "Export",
    !is.na(rcvcountry) ~ "Import",
    is.na(shpcountry) | is.na(rcvcountry) ~"Unknown"
  )) 

#reorder the columns 
MC2_id_list <- MC2_id_list %>%
  select(label,trade_status,shpcountry, rcvcountry)

#create df to identify companies in target column that are not in the nodes 
ID_target <- MC2_challenge_edges %>%
  filter(!(target %in% MC2_id_list$label)) %>%
  distinct(target) %>%
  #rename to match names in nodes df
  rename(id = target) %>%
  #dummy columns are created to bind rows together
  mutate(dataset = NA, shpcountry = NA, rcvcountry = NA)

#create df to identify companies in source column that are not in the nodes 
ID_source <- MC2_challenge_edges %>%
  filter(!(source %in% MC2_id_list$label)) %>%
  distinct(source) %>%
  rename(id = source) %>%
  mutate(dataset = NA, shpcountry = NA, rcvcountry = NA)

#append the distinct companies into the nodes 
MC2_challenge_nodes <- MC2_challenge_nodes %>%
  rbind(ID_target) %>%
  rbind(ID_source)

As observed, all the companies are well stored in the MC2_challenge_nodes data frame.

#output of ID nodes that are not in Master list
nrow(ID_source)

[1] 0

nrow(ID_target)

[1] 0

3.5 Data Wrangling

After concatenating and creating a master ID data frame, adjustments are made to rectify the following:

arrivaldate is not in date format. [Revised through lubridate in the ymd format]
hscode is not in chr format. [Revised from int]
date is not comprehensive. [Create new column called year]
edge data frame does not have corresponding id to source and target. [Rename source and target to sourcelabel and targetlabel respectively. Thereafter, Left_join with master ID MC2_id_list_vis data frame to get the corresponding ID]

Show the code

#revise the data format for arrivaldate and hscode 
MC2_challenge_edges<- MC2_challenge_edges %>%
  rename(sourcelabel = source, targetlabel = target) %>%
  mutate(arrivaldate =ymd(arrivaldate),
         hscode = as.character(hscode),
         year = year(arrivaldate))

#to append correspoinding id through left_join 
MC2_challenge_edges <- MC2_challenge_edges %>%
  left_join(MC2_id_list_vis, by = c("sourcelabel" = "label")) %>%
  rename(source = id) %>%
  left_join(MC2_id_list_vis, by = c("targetlabel" = "label")) %>%
  rename(target = id) %>%
  relocate(10,14)

#revised the same approach to the bundles 
combined_edges_cleaned<- combined_edges %>%
  mutate(arrivaldate =ymd(arrivaldate),
         hscode = as.character(hscode))

#output for dataframe using knitr:: kable
kable(head(combined_edges_cleaned), "simple")

source	target	dataset	generated_by	arrivaldate	hscode	valueofgoods_omu	volumeteu	weightkg
Tshimbua GmbH & Co. KG	Caracola del Sol Services	MC2	carp	2034-03-20	80440	15915	0	15720
Marine Masterminds Dry dock	Playa de la Luna Incorporated	MC2	carp	2034-08-01	940179	NA	0	10795
Marine Masterminds Dry dock	Saltwater Supreme ОАО Forwading	MC2	carp	2034-11-27	940161	NA	0	6555
Marine Masterminds Dry dock	Saltwater Supreme ОАО Forwading	MC2	carp	2034-10-10	940161	NA	0	6675
zhāng yú Ges.m.b.H. Solutions	Portuguese Tuna Incorporated Marine	MC2	carp	2034-04-09	40729	NA	15	54775
Nile S.A. de C.V.	Caracola del Sol Services	MC2	carp	2034-03-30	700711	NA	0	22430

Duplicates found [Retain for further analyses as the purchases might be broken down into small trade to avoid detection ]

Show the code

#check for duplicates 
dup <- (nrow(MC2_challenge_edges) - nrow(unique(MC2_challenge_edges)))
#reformat output 
dup_reformat <- format(dup, big.mark=",")
#print output
dup_reformat

[1] "155,291"

4. Distribution of transactions

In this section, we will create plots with interactivity to study and explore the data from the knowledge graph. The plots are created with the use of heatmaply, visNetwork, and igraphpackages.

4.1 Number of Transactions by Year and Month

Heatmap is created to provide a graphical representation to the transactions. It uses a system of color-coding to represent different values. [RColorBrewer] package is used to include sequential palettes “Blues” showing progress from low to high (gradient).

As fishing might occurs on a seasonality basis, we created an additional column called month before grouping it by year and month. To create an interactive heatmap, data frame are transpose through pivot_wider before converting it to matrix with the as.matrix function. Thereafter, [heatmaply] package is used.

Show the code

#aggregate to determine transactions count by year and month 
transaction_counts_by_year <- MC2_challenge_edges %>%
  mutate(month = round(month(arrivaldate))) %>%
  group_by(year, month) %>%
  summarise(count = n())

#transpose df by using pivot_wider
pivoted_data <- transaction_counts_by_year %>%
  pivot_wider(names_from = year, values_from = count) %>%
  #remove the month column
  select(2:8)

#convert pivoted_data into a matrix
heatmap_data <- as.matrix(pivoted_data)

#create interactive heatmap without dendrogram
heatmaply(heatmap_data, dendrogram = "none",
          xlab = "Year", ylab = "Month",
          main = "Number of Transactions by Year and Month",
          scale = "none",
          grid_color = "white",
          grid_width = 0.00001,
          titleX = FALSE,
          hide_colorbar =  FALSE,
          label_names = c("Month:", "Year: ", "No. of Transactions:"),
          fontsize_row = 10, fontsize_col = 10,
          colors = "Blues",
          labCol = colnames(heatmap_data),
          labRow = rownames(heatmap_data),
          plot_method = "plotly")

Observations:

On a yearly basis, the volume of trade is decreasing at a decreasing rate. It peaks around 2028-2030 with the highest time period, occurring in 2030.
March seems to be a period with low transactions. However, it reaches a record in Mar 2030, with the highest volume of transactions in the seven years time period.

4.2 Trade flow of Company with above 30 transactions in Mar 2023

Noting that Mar 2030 has a record of high volume, we would like to examine the trade flow of the Company in this time frame. We start off by creating a new edges and nodes data frame. The edges are created by aggregating it from the the master edges file MC2_challenge_edges.

We will be using the [dpylr] package to create MC2_2030_Mar_edges:

mutate(): to add additional column called month
filter(): is used to filter for year = 2030 and month = 3
group_by is used to aggregate it by hscode and year
summarize() is used to compute the weight and median_weight of goods
filter() is used to remove matching name in source and target and to filter for transactions with above 30 counts. This brings up to ~200 rows.
rename() is used to change the title of the column

Thereafter, we create MC2_2030_Mar_nodes by filtering out the distinct ID that are in the source and label column from the master ID data frame, MC2_id_list_vis .

Show the code

#create edges df for transactions from Mar 2030
MC2_2030_Mar_edges <- MC2_challenge_edges %>%
  mutate(month = round(month(arrivaldate))) %>%
  filter(year == "2030" & month =="3") %>%
  group_by(source, target, hscode) %>%
  summarize(weight = n()) %>%
  filter(source !=target) %>%
  filter(weight >29) %>% #keep ~200 rows 
  rename(from = source, to = target) %>%
  select(1,2,4,3) %>% #relocate weight to 3rd column 
  ungroup()

#create nodes df for transactions from Mar 2030
MC2_2030_Mar_nodes <- MC2_id_list_vis %>%
  filter(id %in% MC2_2030_Mar_edges$from | id %in% MC2_2030_Mar_edges$to) %>%
  distinct()

In the code chunk below, we compute the centrality between the nodes, by using the graph_from_data_frame function from the [igraph] package. As seen from the output below, the top row represents the ID of the companies while the bottom row represents their score. Given that the result of the betweenness centrality and closeness centrality are not significant, we will would look into the degree centrality.

#create igraph object 
g <- graph_from_data_frame(d=MC2_2030_Mar_edges, 
                           vertices=MC2_2030_Mar_nodes, directed=TRUE) 

#compute betweeness centrality 
betweenness_centrality <- betweenness(g)
MC2_2030_Mar_nodes$betweenness_centrality <-
  betweenness_centrality[as.character(MC2_2030_Mar_nodes$id)]

#output for top results
head(sort(betweenness_centrality, decreasing=TRUE))

 8  3  5  6  9 11 
 6  0  0  0  0  0

After the data preparation, visNetwork is used to plot the plot the interactive network graph with the Fruchterman and Reingold layout. Additionally, the graph uses three colors tone from the diverging palettes scheme to color-code the nodes based on degree centrality. It ranges from red - yellow - blue.

Show the code

#compute degree centrality 
degree_centrality <- degree(g)
MC2_2030_Mar_nodes$degree_centrality <- 
  degree_centrality[as.character(MC2_2030_Mar_nodes$id)]

#compute closeness centrality 
closeness_centrality <- closeness(g, normalized=TRUE)
MC2_2030_Mar_nodes$closeness_centrality <- 
  closeness_centrality[as.character(MC2_2030_Mar_nodes$id)]

#add diverging palettes scheme to nodes based on degree centrality
colors <- colorRampPalette(brewer.pal(3, "RdYlBu"))(3) # use three colors 
MC2_2030_Mar_nodes <- MC2_2030_Mar_nodes %>%
  mutate(shape="dot", shadow=TRUE,  
         title=trade_status, # hover for trade_status
         label=label, # add labels on nodes
         size=20, # set size of nodes
         borderWidth=1, #set border width of nodes
         color.background=colors[degree_centrality], #set color
         color.border="grey") #set border color 

#plot network graph 
visNetwork(MC2_2030_Mar_nodes, MC2_2030_Mar_edges, 
            main ="Trade flow of Company  
           <br>with above 30 transactions in Mar 2023<br>") %>%
  visIgraphLayout(layout = "layout_with_fr") %>%
  visOptions(highlightNearest = TRUE,
             nodesIdSelection = TRUE,
             selectedBy ="degree_centrality") %>%
  visEdges(arrows = "to") %>% #indicate direction 
  visLayout(randomSeed = 123) %>%
  addFontAwesome(name = "font-awesome") %>% #add icon to network 
  visInteraction(dragNodes = FALSE, dragView = TRUE, 
                 zoomView = TRUE, navigationButtons = TRUE) #freeze network

Note about visNetwork

visIgraphLayout is used to compute coordinates. In the example above, the Fruchterman and Reingold layout is used.

visOptions is an options for network visualization. We highlighted the nearest when clicking a node, a dropdown list for ID, a dropdown list by degree of centrality.

visEdges is edges options. We includes arrow to indicate direction.

visLayout is an layout options. randomSeed is included for the layout to remain the same every time.

addFontAwesome is used to add icons to the network.

visInteraction is used for network visualization interaction.

The default setting as as follows:

dragNodes : IF TRUE, nodes can be dragged around by user.
dragView : If TRUE, view can be dragged around by user.
zoomView : If TRUE, user can zoom in.
navigationButtons: If FALSE, navigation buttons are not on the network graph

Observations:

Blue nodes refer to a high degree of centrality. hǎi dǎn Corporation Wharf has the highest degree of 19, followed by Saltwater Supreme OAO forwarding with a score of 12.
Majority of the Company have low degree of centrality and work exclusively with another counterpart. For companies in cluster, they generally have high degree of centrality and deals with companies with low degree of centrality (Orange nodes).

5. Distribution of Median Weight

By hscode and year

A treemap displays hierarchical data as a set of nested rectangles, where each group is represented by an area that is proportional to its value. We would like to examine the distribution by hscode and year of the median weight and transaction counts(weight). Based on median weight, we will be able to identify companies who over fish.

As per section 3.1, the [devtools] package have been loaded. Thereafter, we install the package found in github (#install_github) and load the [d3treeR] package.

(Note: we are only required to install the github package once)

#install_github("timelyportfolio/d3treeR")
library(d3treeR)

We will aggregate from MC2_challenge_edges data frame by using the [dpylr] package:

group_by() is used to aggregate it by hscode and year
summarize() is used to compute the weight and median_weight of goods
arrange(desc()) to sort weight in descending order
MC2_hscode_weight[1:100,] to retrieve the first 100 rows

(We are only interested in the highly utilized hscode)

We build our treemap by inputting our aggregated data frame (MC2_hscode_weight) into the treemap() function and saved it as an object called it. Thereafter, the d3tree function from the [d3treeR] package is used to build our interactive treemap.

Show the code

#aggregate into new df by hscode and year before sorting in descending order 
MC2_hscode_weight <- MC2_challenge_edges %>%
  group_by(hscode,year) %>%
  summarize(weight = n(),
            median_weight = median(weightkg)) %>%
  arrange(desc(weight)) 

#retrieve the first 100 rows
MC2_hscode_weight <- MC2_hscode_weight[1:100,]

#treemap saved under 'it' object 
it <- treemap(MC2_hscode_weight,
        index=c("hscode","year"),
        vSize="weight",
        vColor="median_weight",
        type ="value",
        algorithm = "pivotSize",
        sortID = "weight",
        palette="Blues",
        border.lwds = "white"
        )

d3tree(it, rootname = "Distribution of hscode")

Note about Treemap

index vector shows the hscode code followed by the year.

vSize vector shows the distribution by weight(the size of the rectangles)

vColor vector shows the different intensity by median_weight

type vector reflects the value type treemap

algorithm vector to pivot by size

sortID vector to determined the order from top left to bottom right

palette vector paints the treemap using [rColorBrewer] package

border.lwds vector paints the border width to white

Observations:

As observed from top left to bottom right, hscode '306170' has the highest transactions count. Likewise, the median weight is more left-skewed.
As observed from the treemap, hscode '870323 has the darkest gradient in 2032. With reference to section 4.1, the transaction counts did not increased. Thus, it would be ideal to observe the transactions of hscode 870323 in year 2032.
HS code is a hierarchical model. As iterated, it composes of six digits and could be broken down into chapter/heading/subheading with two digits each. As observed, hscode that starts with 30 occupies a higher weightage as well. From the graph above, hscode 306170 and 304620 are rank in the first and sixth position.

5.1 Trade flow for Hscode 306170

In this section, we will be looking into the trade flow of hscode 306170 with the use of Eigenvector centrality. It measures the importance or influence of a node based on its connections to other highly central nodes.

After using similar approach as section 4.2, we retrieve the top 300 rows and create a color scale RdYlBu based on the range of Eigenvector. The blue nodes below represents key players or market leader in the network. Moreover, a high Eigenvector centrality indicates an important/influential connection to other nodes.

Show the code

#create edges df for transactions for hscode 306170
MC2_306170_edges <- MC2_challenge_edges %>%
  filter(hscode =="306170") %>%
  group_by(source, target) %>%
  summarize(weight = n()) %>%
  filter(source !=target) %>%
  filter(weight >103) %>% #keep ~300 rows 
  rename(from = source, to = target) %>%
  ungroup()

#create nodes df for transactions from Mar 2030
MC2_306170_nodes <- MC2_id_list_vis %>%
  filter(id %in% MC2_306170_edges$from | id %in% MC2_306170_edges$to) %>%
  distinct() 

#create igraph object 
g_hscode <- graph_from_data_frame(d=MC2_306170_edges, 
                           vertices=MC2_306170_nodes, directed=TRUE) 

#compute eigen centrality 
ev_centrality <- eigen_centrality(g_hscode)$vector
MC2_306170_nodes$eigen_centrality <- ev_centrality

# define color palette
num_color_groups <- 3
color_palette <- colorRampPalette(brewer.pal
                                  (num_color_groups, "RdYlBu"))(num_color_groups)

# create a color scale based on the eigenvector values range 
min_centrality <- min(MC2_306170_nodes$eigen_centrality, na.rm = TRUE)
max_centrality <- max(MC2_306170_nodes$eigen_centrality, na.rm = TRUE)
color_scale <- scales::rescale(MC2_306170_nodes$eigen_centrality, to = c(0, 1))

# assign colors to the nodes based on eigenvector centrality
MC2_306170_nodes$color <- color_palette[cut(color_scale, breaks = num_color_groups)]

#add diverging palettes scheme to nodes based on degree centrality
MC2_306170_nodes <- MC2_306170_nodes %>%
  mutate(shape="dot", shadow=TRUE,  
         title=trade_status, # hover for trade_status
         label=label, # add labels on nodes
         size=20, # set size of nodes
         borderWidth=1, #set border width of nodes
         color.background=colors[eigen_centrality], #set color for nodes
         color.border="grey") #set color for border

#plot network graph 
visNetwork(MC2_306170_nodes, MC2_306170_edges, 
main ="Trade flow of top 300 transactions
           <br> for hscode 361070<br>") %>%
  visIgraphLayout(layout = "layout_with_fr") %>%
    visOptions(highlightNearest = TRUE,
             nodesIdSelection = TRUE) %>%
  visEdges(arrows = "to") %>% #indicate direction 
  visLayout(randomSeed = 123) %>%
  addFontAwesome(name = "font-awesome") %>% #add icon to network 
  visInteraction(dragNodes = FALSE, dragView = TRUE, 
                 zoomView = TRUE, navigationButtons = TRUE) #freeze network

Observations:

There are three key opinion leaders and three sub-leaders in the network where Mar del Este CJSC could be inferred as the market leader.
Interestingly, the company with high degree of centrality in the previous section - hǎi dǎn Corporation Wharf do not have a high eigenvector centrality. Although they have a large number of connections who could potential reach and interact with other connection, their eigenvector remains low. We deduced that the nodes that they are connected to are not well connected.

5.2 Trade flow for Hscode 873023 in year 2032

In this section, we will be looking into the trade flow of hscode 873023 in year 2032 since there is a spike increase in median_weight. It will be measured with Degree Centrality as it helps to identify node who has the greatest interaction in the time period. Moreover, we color the node based on values. Blue nodes represents high degree while Orange nodes represents low degree.

Show the code

#create edges df for transactions from Mar 2030
MC2_2032_870323_edges <- MC2_challenge_edges %>%
  filter(year == "2032" & hscode =="870323") %>%
  group_by(source, target, hscode) %>%
  summarize(weight = n()) %>%
  filter(source !=target) %>%
  filter(weight >1) %>% #keep ~200 rows 
  rename(from = source, to = target) %>%
  select(1,2,4,3) %>% #relocate weight to 3rd column 
  ungroup()

#create nodes df for transactions from Mar 2030
MC2_2032_870323_nodes <- MC2_id_list_vis %>%
  filter(id %in% MC2_2032_870323_edges$from | id %in% MC2_2032_870323_edges$to) %>%
  distinct() 

#create igraph object 
gg <- graph_from_data_frame(d=MC2_2032_870323_edges, 
                           vertices=MC2_2032_870323_nodes, directed=TRUE) 

#compute degree centrality 
degree_centrality <- degree(gg)
MC2_2032_870323_nodes$degree_centrality <- 
  degree_centrality[as.character(MC2_2032_870323_nodes$id)]

#add diverging palettes scheme to nodes based on degree centrality
colors <- colorRampPalette(brewer.pal(3, "RdYlBu"))(3) # use three colors 
MC2_2032_870323_nodes <- MC2_2032_870323_nodes %>%
  mutate(shape="dot", shadow=TRUE,  
         title=trade_status, # hover for trade_status
         label=label, # add labels on nodes
         size=20, # set size of nodes
         borderWidth=1, #set border width of nodes
         color.background=colors[degree_centrality], #set color for nodes
         color.border="grey") #set color for border

#plot network graph 
visNetwork(MC2_2032_870323_nodes, MC2_2032_870323_edges, main ="Trade flow in 2032
           <br>for hscode 870323<br>") %>%
  visIgraphLayout(layout = "layout_with_fr") %>%
    visOptions(highlightNearest = TRUE,
             nodesIdSelection = TRUE,
             selectedBy = "degree_centrality") %>%
  visEdges(arrows = "to") %>% #indicate direction 
  visLayout(randomSeed = 123) %>%
  addFontAwesome(name = "font-awesome") %>% #add icon to network 
  visInteraction(dragNodes = FALSE, dragView = TRUE, 
                 zoomView = TRUE, navigationButtons = TRUE) #freeze network

Observations:

In comparison to hscode 361070, there are more key opinion leaders and sub-leaders in this network. They are heavily connected to one another, with more interaction between companies with large connections.
At the bottom right of the network graph, there are two distinct hub and spoke network which belongs to Estrella del Mar Seafarer and Chhattisgarh Marine ecology A/S Delivery. There is a central node who is directly connected to all other nodes (spokes) with a high degree of centrality. It receives from one company but disseminates goods to seven/eight companies.

5.3 Examining Chapter 30 Hscode

Among the 17 hscode identified in the treemap above, we noticed that there are two codes which belongs to Chapter 30. In terms of volume, 306170 and 304620 are ranked first and sixth respectively. Therefore, we would like to match with the set of predicted knowledge graph links from FishEye to determine if we would get any insight.

Interestingly, this chapter belongs to the category of Fish.

Show the code

# extract hscode that starts with 30
start_code <- str_sub(combined_edges_cleaned$hscode, start = 1, end = 2)

# filter for hs code that starts with 30 
combined_edges_cleaned_30 <- combined_edges_cleaned %>%
  filter(start_code == "30")

# create df for total count of hscode
combined_edges_cleaned_hscount <- combined_edges_cleaned_30 %>%
  group_by(generated_by) %>%
  summarise(hscount = n_distinct(hscode)) %>%
  arrange(desc(hscount))

#create df for total weightage by each fish type and hs code 
combined_edges_cleaned_30 <- combined_edges_cleaned_30 %>%
  group_by(generated_by, hscode) %>%
  summarise(weight = n()) %>%
  arrange(desc(weight))

# combined both df through left join 
combined_edges_cleaned_30hscount <- left_join(combined_edges_cleaned_30,
                                              combined_edges_cleaned_hscount,
                                              by = "generated_by")

#treemap saved under '30hscount' object 
hscount30 <- treemap(combined_edges_cleaned_30hscount,
        index=c("generated_by","hscode"),
        vSize="weight",
        vColor="hscount",
        type ="value",
        algorithm = "pivotSize",
        sortID = "weight",
        palette="Blues",
        border.lwds = "white"
        )

d3tree(hscount30, rootname = "Distribution based on generated program")

Observations:

From the predicted knowledge graph links, Cod2 program has the highest amount of transaction counts followed by salmon.
There are multiple hscode tagged to each fish types. However, this is an exception to Lichen, which only have 1 hscode - 300510. Salmon and cod2 have a lower count, while mackerel and catfish have 15 different hs code as identified in the differences in gradient.
Upon closer inspection, It has been observed that there are various hscode that are not comprehensive. The sub-header is missing. e.g. hscode in salmon and cod2 contain five digits whereas lichen has six digits hscode.

5.3.1 Structure of Hscode

Noting that the Hscode in the bundles are not comprehensive, we would like to examine the list of companies who have incomplete hscode. We retrieved the top 20 with more than 10 transactions and uses an ifelse function to include Unknown to rcvcountry with N/A. Thereafter, rename() is used to change rcvcountry to group for color coding.

Show the code

#filter for companies with 5 digits hscode 
combined_edges_cleaned_30hscount_new <- combined_edges_cleaned_30hscount %>%
  filter(nchar(hscode) == 5) %>%
  ungroup()

#filter the main edges file with the filtered 5 digits hscode
combined_edges_cleaned_filtered <- combined_edges_cleaned %>%
  filter(hscode %in% combined_edges_cleaned_30hscount_new$hscode) %>%
  ungroup()

# combined both df through left join 
combined_edges_cleaned_newedges <- left_join(combined_edges_cleaned_filtered,
                          combined_edges_cleaned_30hscount_new,by = "hscode")

# aggregate and retrieve top 20 rows with >9 transactions 
combined_edges_cleaned_newedgestt <- combined_edges_cleaned_newedges %>%
  group_by(source,target,hscode) %>%
  summarize(weight =n()) %>%
  filter(source!=target) %>%
  filter(weight >9) %>% 
  arrange(desc(weight)) %>%
  ungroup()

# filter out the nodes in newedges 
combined_nodes_filtered <- combined_nodes %>%
  rename(label =id) %>%
  filter(label %in% combined_edges_cleaned_newedgestt$source |
           label %in% combined_edges_cleaned_newedgestt$target ) %>%
  select(1) %>%
  distinct() %>%
  ungroup()

#left join to get id of the nodes 
combined_nodes_filtered_addid <- combined_nodes_filtered %>%
  filter(label %in% MC2_id_list_vis$label) %>%
  left_join(MC2_id_list_vis,by="label") %>%
  relocate(id,label) %>%
  rename(group = rcvcountry) %>%
  ungroup()

#add unknown to N/A in rcvcountry 
combined_nodes_filtered_addid$group<- ifelse(is.na(combined_nodes_filtered_addid$group), "Unknown", combined_nodes_filtered_addid$group)

#append by matching exporters to the importers (list of unknown)
combined_edges_cleaned_newedges_test <- combined_edges_cleaned_newedges %>%
 left_join(combined_nodes_filtered_addid, by = c("source" = "label")) %>%
  rename(from = id) %>%
 left_join(combined_nodes_filtered_addid, by = c("target" = "label")) %>%
  rename(to = id) %>%
  filter (!is.na(to)) %>%
  group_by(from,to) %>%
  summarise(weight = n()) %>%
  filter (weight >9) %>%
  ungroup()

#output for dataframe using knitr:: kable
kable(combined_nodes_filtered, "simple")

abel
Caracola del Sol Services
Mar del Este CJSC
Pao gan LC Freight
hǎi dǎn Corporation Wharf
Tsha wamba S.A. de C.V.
Saltwater Solitude N.V. International
Black Sea Tuna Sagl
Pao gan SE Seal
Sea Breeze Corporation Marine sanctuary
Selous Game Reserve S.A. de C.V.
-2
Aqua Azul LC International
Arunachal Pradesh s Brine
Belgian Cod BV Solutions
Wave Watchers Ltd. Liability Co
Manipur Market Corporation Cargo
Náutica del Sol Corporation
Matthew Oyj Marine sanctuary
1 Ltd. Liability Co Cargo
2 Limited Liability Company
Arunachal Pradesh s Plc
Kariba Dam Ges.m.b.H. Delivery
Oceanic Opportunities CJSC Marine
Norwegian Haddock GmbH & Co. KG
-53
Saltwater Surf Club Ltd. Liability Co Transport
Belgian Scallop Harbor ОАО Freight
David Limited Liability Company Worldwide
Logistics Ltd. Liability Co
Greek Octopus Ltd. Corporation
Ancla de Oro Kga Import
Samaka Chart ОАО Delivery
xiǎo xiā S.p.A. Deep-sea
Faroe Islands Halibut Sea S.p.A. Distribution
Irish Sea Salt spray
Joseph Limited Liability Company

Observations:

Among the 36 companies identified with inaccurate hscode, we found out that two identified companies - Mar del Este CJSC and hǎi dǎn Corporation Wharf are included the list.

Henceforth, we will plot a network graph to look into the interaction of the company and their receiving country.

Show the code

visNetwork(combined_nodes_filtered_addid, combined_edges_cleaned_newedges_test, main ="Companies with Inaccurate hs code 
           <br>and >10 transactions <br>") %>%
  visIgraphLayout(layout = "layout_with_fr") %>%
    visOptions(highlightNearest = TRUE,
             nodesIdSelection = TRUE,
             selectedBy = "group") %>%
  visLegend() %>%
  visEdges(arrows = "to") %>% #indicate direction 
  visLayout(randomSeed = 123) %>%
  visLegend() %>%
  addFontAwesome(name = "font-awesome") %>% #add icon to network 
  visGroups(groupname = "Coralmarica", shape = "icon", 
            icon= list(code ="f21a", color = "#EBCC2A" )) %>% 
  visGroups(groupname = "Unknown", shape = "icon", 
            icon= list(code ="f21a", color = "#F21A00" )) %>% 
  visGroups(groupname = "Kuzalanda", shape = "icon", 
            icon= list(code ="f21a", color = "#606060" )) %>% 
  visGroups(groupname = "Merigrad", shape = "icon", 
            icon= list(code ="f21a", color = "#00887d" )) %>% 
    visGroups(groupname = "Oceanus", shape = "icon", 
            icon= list(code ="f21a", color = "#3B9AB2")) %>% 
  visInteraction(dragNodes = FALSE, dragView = TRUE, 
                 zoomView = TRUE, navigationButtons = TRUE) #freeze network

Observations:

Majority of the receiving companies are associated with Oceanus with a minority receiving from companies with no association.
There are no distinct cluster. Meanwhile, the two companies that we identified are importers who receives goods from two sources. However, there are no significant findings from the companies with incomplete Hscode and we would like to further determine if there are any incomplete Hscode in the main edges data frame.

5.3.2 Incomplete Hscode in Edges

From the code chunk below, we did a quick check on the hscode in MC2_challenge_edges. We found out that all our transactions in the knowledge graph contains six digits hscode. As such, we will eliminate bundles with inaccurate hscode.

#filter for companies with 5 digits hscode
MC2_challenge_edges_check <- MC2_challenge_edges %>%
  filter(nchar(hscode) == 5) %>%
  ungroup()

any(MC2_challenge_edges_check)

[1] FALSE

5.3.3 Accuracy of Hscode

Noting that chapter 30 hscode does not include Shark, we will examine the accuracy of hscode by looking at the completion of the hscode in the following steps:

Filter() for companies with 5 digits hscode, 6 digits hscode separately
Aggregate the two data frame by the program (generated_by) using group_by() and count the occurrence using summarize()
Create new columns to add the hscode status

(Complete for hscode with 6 digits, Incomplete for hscode with 5 digits)
Combined both data frame through rbind since there is an exact match in the number of columns
Plot the interactive stacked bar chart with plot_ly() in descending order

Show the code

#filter for companies with 6 digits hscode 
combined_edges_cleaned6 <- combined_edges_cleaned %>%
  filter(nchar(hscode) == 6) %>%
  ungroup()

#filter for companies  with 5 digits hscode
combined_edges_cleaned5 <- combined_edges_cleaned %>%
  filter(nchar(hscode) == 5) %>%
  ungroup()

#aggregate for program with 5digits hscode 
count5 <- combined_edges_cleaned5 %>%
  group_by(generated_by) %>%
  summarize(count = n()) %>%
  ungroup ()

#aggregrate for program with 6 digits hscode 
count6 <- combined_edges_cleaned6 %>%
  group_by(generated_by) %>%
  summarize(count = n()) %>%
  ungroup ()

#create new column to add hscode status 
count5$type <- "Incomplete"
count6$type <- "Complete"

#combined both df with rbind 
combined_df <- rbind(count5,count6)

# Create an interactive stacked bar plot
plot_ly(combined_df, x = ~generated_by, 
        y = ~count, color = ~type, type = "bar") |>
  layout(xaxis = list(title = 'Generated Program',
                      categoryorder = "total descending"),
         yaxis = list(title = 'Count'), 
         title = 'Completion of Hscode by structure', 
         barmode = 'stack')   #create stacked barchart

Observations:

There is a high completion rate for most program tools with the exception of Cod2 with Incomplete data throughout. It is deem as unreliable as everything is incomplete. On the flip side, data retrieved from Lichen and Shark have the highest accuracies. Thus, we concluded that Lichen and Shark are the reliable sets which will aid in the completion of the graph.

6. Companies with Unknown Trade Status

As seen in the Section 2.1 Metadata , the shpcountry and rcvcountry refers to the country that the company are most often associated with. Therefore, a newly founded company might have an unknown status. This might be applicable for those who want to avoid detection and start up a new company under a different name.

Henceforth, we would to examine the group with unknown trade status. The code has been revised previously in section 3.5 when we are creating the Trade Status Column. The code chunks below filters for unknown in trade_status in the receiver end and filter for Exporters before doing a left_join.

Show the code

#create df to filter list of unknown parties 
unknownMC2_nodes <- MC2_id_list_vis %>%
  filter(trade_status == "Unknown") %>%
  select(1:2)

#create df to filter list of exporters who solely do export or both 
MC2_exporters <- MC2_id_list_vis %>%
  filter(trade_status == "Export" | trade_status == "Import & Export")

#append by matching exporters to the importers (list of unknown)
unknownMC2 <- MC2_challenge_edges %>%
  left_join(MC2_exporters, by = c("sourcelabel" = "label")) %>%
  rename(from = id) %>%
  left_join(unknownMC2_nodes, by = c("targetlabel" = "label")) %>%
  rename(to = id) %>%
  filter (!is.na(to)) %>%
  group_by(from,to) %>%
    summarise(weight = n()) %>%
  filter (weight >1) %>%
    ungroup()

#filter and aggregate for the nodes 
unknownMC2_revnodes <- MC2_id_list_vis %>%
  filter(id %in% unknownMC2$from | id %in% unknownMC2$to) %>%
  distinct() %>%
  select(1:3) %>%
  rename(group = trade_status)

Similarly, visNetwork is used to plot the plot the interactive network graph with the Fruchterman and Reingold layout. We includes addFontAwesome() function to modify the color through visGroups. Custom navigation have been included with the feature to zoom and view, drag and view.

Show the code

#create network diagram for exporters
visNetwork(unknownMC2_revnodes, unknownMC2, 
           main = "Exporters who ship to <br> Unknown Origin<br>") %>%
  visIgraphLayout(layout = "layout_with_fr") %>%
  visOptions(highlightNearest = TRUE,
             nodesIdSelection = TRUE) %>%
  visLayout (randomSeed = 123) %>% 
  visEdges(arrows = "to") %>% #indicate direction 
  addFontAwesome(name = "font-awesome") %>% #add icon to network 
  #fill color and shape
  visGroups(groupname = "Unknown", shape = "icon", 
            icon= list(code ="f21a", color = "#E1AF00" )) %>% 
  visGroups(groupname = "Import & Export", shape = "icon", 
            icon= list(code ="f21a", color = "#3B9AB2" )) %>% 
  visLegend() %>%
  visInteraction(dragNodes = FALSE, dragView = TRUE, 
                 zoomView = TRUE, navigationButtons = TRUE) #freeze network

Based on the network above, two companies are identified to ship to three or more receivers with unknown origin. Further monitoring might be required for the group of receivers/importers below until they have a trade status.

Shipper	Receiver
hǎi dǎn Corporation Wharf	SouthSeafood Express Corp Maacama Ocean Worldwide LLC OranjestadCreek Express Sagl SavanetaCreek Solutions NV HomabayMarine Carriers N.V.
Sailors and Surfers Incorporated Enterprises	SavanetaCreek Solutions NV SumacAmerica Transport GmbH & Co. KG 8.Marine United AG

hǎi dǎn Corporation Wharf

SouthSeafood Express Corp

Maacama Ocean Worldwide LLC

OranjestadCreek Express Sagl

SavanetaCreek Solutions NV

HomabayMarine Carriers N.V.

Sailors and Surfers Incorporated Enterprises

SavanetaCreek Solutions NV

SumacAmerica Transport GmbH & Co. KG

8.Marine United AG

7. Companies with Duplicated Entries

Although we have included the duplicated entries in our exercise, we would like to find out more about the companies with high duplicate counts. Thus, we filtered to the top 10 transactions by using top_n().

Show the code

#detect duplicate rows and aggregate it 
MC2_dup_rows <- MC2_challenge_edges[duplicated(MC2_challenge_edges),] %>%
  rename(from = source) %>%
  rename(to = target) %>%
  group_by(from,to) %>%
    summarise(weight = n()) %>%
  filter (weight >1) %>%
  ungroup()

#filter top 10 transactions by weight
MC2_dup_top10 <- MC2_dup_rows %>%
  top_n(10,weight)
   
#create df with corresponding nodes 
MC2_dup_nodes <- MC2_id_list_vis %>%
  filter(id %in% MC2_dup_top10$from | id %in% MC2_dup_top10$to) %>%
  distinct() %>%
  select(1:3) %>%
  rename(group = trade_status)

#create network diagram for exporters
visNetwork(MC2_dup_nodes, MC2_dup_top10, 
           main = "Top 10 duplicated transactions") %>%
  visIgraphLayout(layout = "layout_with_fr") %>%
  visOptions(highlightNearest = TRUE,
             nodesIdSelection = TRUE) %>%
  visLayout(randomSeed = 123) %>% 
  visEdges(arrows = "to") %>% #indicate direction 
  addFontAwesome(name = "font-awesome") %>% #add icon to network 
  #fill color and shape
  visLegend() %>%
  visGroups(groupname = "Import", shape = "icon", 
            icon= list(code ="f21a", color = "#E1AF00" )) %>% 
  visGroups(groupname = "Import & Export", shape = "icon", 
            icon= list(code ="f21a", color = "#3B9AB2" )) %>% 
  visInteraction(dragNodes = FALSE, dragView = TRUE, 
                 zoomView = TRUE, navigationButtons = TRUE) #freeze network

Observations:

Companies with duplicated transactions tends to work with exclusivity . They do not have multiple interconnections and they operates on both ends of the trade, apart from -2119 who is an importer. In depth analysis might be needed to identify individual companies.

8. Conclusion:

Thus far, the detection of illegal fishing has been a challenging issues. There are many attributes that could lead to illegal, unreported, and unregulated (IUU) fishing. In our exercise, we observed the following:

Abnormality in transaction counts in time period. Transactions decreases over the years and peak around 2028-2030. By seasonality, March seems to be a period with low transactions. From the network graph, we identified that there it is highly saturated by some companies.
Hscode with highest transactions counts does not have the highest median weight. hs code 306170 ranked 1st in terms of volume and it belongs to the Chapter 30 family along with 304620.
Further analysis of Chapter 30 revealed that the bundles created are not accurate. Most reliable sets goes to Lichen and Shark which have 100% accuracy in terms of hscode.
Monitoring required for companies with unknown trade status as there are companies who supplies to a group of unknown companies. Likewise, further analysis might be required to examine the patterns of companies with high duplicate counts.

References:

Delignette-Muller, M. L., & Dutang, C. (2020). visNetwork: Network Visualization using ‘vis.js’ Library. Retrieved May 28, 2023, from https://cran.r-project.org/web/packages/visNetwork/vignettes/Introduction-to-visNetwork.html

Nagraj, R. (n.d.). Network Visualization in R with visNetwork. Retrieved May 28, 2023, from https://www.nagraj.net/notes/visnetwork/

R Graph Gallery. (n.d.). RColorBrewer Palettes. Retrieved May 28, 2023, from https://r-graph-gallery.com/38-rcolorbrewers-palettes.html

Tran, J. (2021). Network Visualization in R - Centrality Measurement. Retrieved May 28, 2023, from https://jtr13.github.io/cc21fall2/network-visualization-in-r.html#centrality-measurement

Tran, J. (2021). Network Visualization in R. Retrieved May 28, 2023, from https://jtr13.github.io/cc21fall2/network-visualization-in-r.html

VAST Challenge (2023). VAST Challenge 2023 - Mini-Challenge 2: Fishy Business. Retrieved May 22 , 2023, from https://vast-challenge.github.io/2023/MC2.html