-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathtransform_and_merge_01.Rmd
96 lines (66 loc) · 1.93 KB
/
transform_and_merge_01.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
---
title: "Share Example"
output: html_notebook
---
**Goal:** identify countries on opposite sides
## Disclaimers
- User should verify the results
- There is a potential to generate more concise code. Meanwhile, this appears to work just fine.
## Load Libraries
```{r loadpackages}
library(tidyverse)
```
## Example Data
```{r sampledata}
country <- c("A","B","C","D","A","E","F","X","Y","Z","Q","R")
dispute <- c(1,1,1,1,2,2,2,3,3,3,3,3)
side_a <- c(0,0,1,1,0,1,1,0,0,0,1,1)
side_b <- c(1,1,0,0,1,0,0,1,1,1,0,0)
country_df <- data_frame(country, dispute, side_a, side_b)
country_df
```
## Identify Sides
### Side B Countries
```{r sidea}
side_a <- country_df %>%
filter(side_a == 0)
side_a
```
### Side A Countries
```{r sideb}
side_b <- country_df %>%
filter(side_a == 1)
side_b
```
## Transform Data ; Build Data Frame
```{r for_df}
# declare Tibble (data frame)
df_total = tibble()
for (i in 1:max(country_df$dispute)) {
# Build vector of disputants (multi-valued cell: country_y)
country_y <- country_df %>%
filter(dispute == i) %>%
select(-dispute) %>%
mutate(side_b = if_else(side_b == 0, country, NA_character_)) %>%
select(side_b) %>%
drop_na() %>%
unique() %>%
as_vector() %>%
paste0(collapse = "|")
# Build a data frame that adds the disputants vector as a column-variable;
# drops blank rows ; separate the multi-valued cells (disputants/country_y)
# into rows
country_df_temp <- country_df %>%
filter(dispute == i) %>%
mutate(country_disputes = if_else(side_b == 1, country_y, NA_character_)) %>%
drop_na() %>%
separate_rows(country_disputes, convert = TRUE)
# Append big Dataframe from from the iterated dataframes
df_total <- bind_rows(df_total, country_df_temp)
}
df_total%>%
arrange(country_disputes) %>%
rename(country_x = country,
counrty_y = country_disputes) %>%
select(dispute, counrty_y, country_x)
```