Hi folks!
I guess you are aware that social media bots become more and more relevant for politics. Those bots are mainly used to influence voters by systematically spreading misinformation aka fake news. If you want to know more about this topic in general, then this Scientific American article is a good starting point. Living in Austria, I want to explore a little whether our politicians can be associated with any bots. To do so, we will look at the twitter accounts of one top politician per party. From the administration, I have selected our chancellor Sebastian Kurz @sebastiankurz for the ÖVP and the vice-chancellor H.C. Strache @HCStracheFP for the FPÖ. Those two were easy, but picking good representatives of the parties in the opposition was a little harder because there has been quite a lot of change in the top positions. Since the new chairperson of the SPÖ, P. Rendi-Wagner, is not very active on twitter we will use the managing director, Thomas Drozda @thomasdrozda, instead. For the GRÜNE I have chosen the most well known Grünen at the moment, our president, A. Van der Bellen @vanderbellen. For the NEOS we will use their new head, Beate Meinl-Reisinger @BMeinl, and for JETZT their founder, Peter Pilz @Peter_Pilz.
With the help of the program Botometer, which was developed at Indiana University and scores twitter accounts between 0 (surely a human) and 5 (surely a bot), we will check, whether
- bots supported the politicians and
- politicians supported bots
by retweeting.
If you are only here for the juicy differences between politicians, then you can stop reading right now. In a nutshell, I have not found any association between their retweets and bots. However, if you are here for seeing how I have come to this conclusion, then please: read on!
Preparations
Let’s start by loading some packages we will need.
library(tidyverse)
library(magrittr)
library(twitteR)
Next, we need access to the Twitter and the Botometer API.
Connect R to Twitter
To retrieve tweets you a) need a Twitter account and b) register yourself as a Twitter developer and create an app.
After filling out some basic information about your app (name and how do you plan to use it) you can get the needed OAuth credentials from Keys and Access Tokens.
Assign your credentials accordingly:
consumer_key <- "your_consumer_key"
consumer_secret <- "your_consumer_secret"
access_token <- "your_access_token"
access_secret <- "your_access_secret"
And use them to connect to Twitter:
setup_twitter_oauth(consumer_key, consumer_secret, access_token, access_secret)
## [1] "Using direct authentication"
Connect R to Botometer
We will use the R client library botcheck provided by Joey Marshall to access the Botometer API.
Let’s start by installing the package.
devtools::install_github("marsha5813/botcheck")
Next, we load the package …
library(botcheck)
… and its dependencies.
library(httr)
library(xml2)
library(RJSONIO)
Then, we need to set the mashape key (the Botometer API is hosted at mashape - you can get your key after signing up for free at https://market.mashape.com/)
Mashape_key = "your_mashape_key"
Finally, we connect our twitter app for botcheck.
myapp = oauth_app("twitter", key=consumer_key, secret=consumer_secret)
sig = sign_oauth1.0(myapp, token=access_token, token_secret=access_secret)
Let’s try if it worked with my twitter handle (I am human, so my score should be close to 0).
botcheck("b_piskernik")
## [1] 0.7565855
Seems human enough.
Get the Tweets
Next, we retrieve the tweet timeline of the selected politicians.
Let’s get the data.
(dat_polit <-
## First, we enter their twitter-handles ...
tibble(handle =
c("sebastiankurz",
"HCStracheFP",
"thomasdrozda",
"vanderbellen",
"BMeinl",
"Peter_Pilz"
)) %>%
mutate(
## ... next, we get the user profiles ...
user = map(handle, getUser),
## ... and finally retrieve their timelines.
tweets = map(user, userTimeline,
n=3200, ## max number of tweets
includeRts = T, ## include retweets
excludeReplies = T ## ignore replies
)
))
## # A tibble: 6 x 3
## handle user tweets
## <chr> <list> <list>
## 1 sebastiankurz <S4: user> <list [2,799]>
## 2 HCStracheFP <S4: user> <list [3,175]>
## 3 thomasdrozda <S4: user> <list [1,396]>
## 4 vanderbellen <S4: user> <list [1,972]>
## 5 BMeinl <S4: user> <list [1,172]>
## 6 Peter_Pilz <S4: user> <list [2,551]>
Seemingly HC Strache was more active than the others, but even their activity should suffice.
In the next step, we transform the data into a more usable state.
dat_tweets <- dat_polit %>%
mutate(
tweets_df = map(tweets, twListToDF)
) %>%
unnest(tweets_df)
Let’s take a glimpse at the result.
dat_tweets %>% glimpse()
## Observations: 13,065
## Variables: 17
## $ handle <chr> "sebastiankurz", "sebastiankurz", "sebastiankurz",…
## $ text <chr> "RT @k_edtstadler: Hier können Sie alle Maßnahmen …
## $ favorited <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, F…
## $ favoriteCount <dbl> 0, 0, 0, 63, 0, 0, 114, 0, 0, 0, 0, 0, 0, 0, 0, 30…
## $ replyToSN <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
## $ created <dttm> 2019-02-13 17:49:20, 2019-02-13 17:49:18, 2019-02…
## $ truncated <lgl> FALSE, FALSE, FALSE, TRUE, FALSE, FALSE, TRUE, FAL…
## $ replyToSID <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
## $ id <chr> "1095741729297227776", "1095741719620931584", "109…
## $ replyToUID <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
## $ statusSource <chr> "<a href=\"http://twitter.com/download/iphone\" re…
## $ screenName <chr> "sebastiankurz", "sebastiankurz", "sebastiankurz",…
## $ retweetCount <dbl> 1, 1, 2, 19, 9, 8, 26, 14, 16, 50, 32, 20, 15, 11,…
## $ isRetweet <lgl> TRUE, TRUE, TRUE, FALSE, TRUE, TRUE, FALSE, TRUE, …
## $ retweeted <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, F…
## $ longitude <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
## $ latitude <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
Visualization data
Now, that we have the data in a suitable format, the last preparatory step is to create a tibble with the names and colors of the politicians. This will come in handy for later visualizations.
(polit_info <- tibble(
handle = dat_polit$handle,
name = c(
"Sebastian Kurz",
"HC Strache",
"Thomas Drozda",
"A. Van der Bellen",
"Beate Meinl-Reisinger",
"Peter Pilz"),
col = c(
"#63C3D0",
"#165D99",
"#E31E2D",
"#51A51E",
"#D11E68",
"#CEC234"
)
) %>%
## convert name to factor to keep order
mutate(name = as_factor(name))
)
## # A tibble: 6 x 3
## handle name col
## <chr> <fct> <chr>
## 1 sebastiankurz Sebastian Kurz #63C3D0
## 2 HCStracheFP HC Strache #165D99
## 3 thomasdrozda Thomas Drozda #E31E2D
## 4 vanderbellen A. Van der Bellen #51A51E
## 5 BMeinl Beate Meinl-Reisinger #D11E68
## 6 Peter_Pilz Peter Pilz #CEC234
With this helper-data at the ready we move to the analysis.
Analysis
Are the politicians human?
Before we test whether the ones, who spread the tweets of our beloved representatives, are humans, let’s check whether the politicians qualify as humans themselves.
polit_info %>%
mutate(
human_bot = map_dbl(handle, botcheck)
)
## # A tibble: 6 x 4
## handle name col human_bot
## <chr> <fct> <chr> <dbl>
## 1 sebastiankurz Sebastian Kurz #63C3D0 0.0355
## 2 HCStracheFP HC Strache #165D99 0.0533
## 3 thomasdrozda Thomas Drozda #E31E2D 0.0301
## 4 vanderbellen A. Van der Bellen #51A51E 0.0355
## 5 BMeinl Beate Meinl-Reisinger #D11E68 0.0533
## 6 Peter_Pilz Peter Pilz #CEC234 0.0418
All of them have scores close to zero, so Botometer is confident that whoever operates those accounts are humans.
Do bots support the politicians?
We will take the entry with most retweets per politician and check the human/bot status of those who retweeted it.
dat_rt_top <- dat_tweets %>%
## get the most retweets per handle
group_by(handle) %>%
dplyr::filter(!isRetweet) %>%
top_n(1, retweetCount)
Let’s have a look at it:
dat_rt_top %>%
select(handle, text, created, retweetCount)
## # A tibble: 6 x 4
## # Groups: handle [6]
## handle text created retweetCount
## <chr> <chr> <dttm> <dbl>
## 1 sebastia… El régimen de #Maduro se ha n… 2019-02-04 09:13:29 5871
## 2 HCStrach… Italiens Innenminister @matte… 2019-01-25 10:19:53 775
## 3 thomasdr… Diese Foto ist offensichtlich… 2019-01-17 15:05:57 195
## 4 vanderbe… Ich freue mich sehr über die … 2018-10-17 09:42:48 2008
## 5 BMeinl Nicht so ideal der Überschrif… 2018-09-21 11:13:54 232
## 6 Peter_Pi… Regierung - Stillstand - @seb… 2017-10-01 08:03:41 145
Hm, the numbers are quite different and, unfortunately, that is a problem. In the next step we would look up the retweeters, but retweeters() returns no more than the last 100. However, bots react automatically and therefore probably faster than most human twitter users. Accordingly, the proportion of bots in the last 100 retweets out of several thousand should be lower than out of a total sample not larger than a few hundred (that is just a hypothesis of mine and might be wrong - if you test it, please let me know the result).
So let’s see if we can find a set of tweets that is more suitable for comparison.
dat_rt_comp <- dat_tweets %>%
dplyr::filter(
!isRetweet,
retweetCount >=100,
retweetCount < 150
) %>%
group_by(handle) %>%
top_n(1, created)
Now, we have limited our selection to tweets with 100 to 150 retweets and selected the most recent ones. Let’s have a look at them:
dat_rt_comp %>%
select(handle, text, created, retweetCount)
## # A tibble: 6 x 4
## # Groups: handle [6]
## handle text created retweetCount
## <chr> <chr> <dttm> <dbl>
## 1 sebastia… Ich möchte mein tiefempfunden… 2019-02-07 09:45:18 101
## 2 HCStrach… Wir wünschen Euch noch einen … 2018-12-24 21:05:29 104
## 3 thomasdr… „Herbert Kickl muss gehen, un… 2019-01-25 14:00:29 109
## 4 vanderbe… "#HolocaustMemorialDay \nUnse… 2019-01-27 13:05:23 109
## 5 BMeinl "Die FPÖ stolpert von einer a… 2019-01-30 21:50:38 146
## 6 Peter_Pi… Regierung - Stillstand - @seb… 2017-10-01 08:03:41 145
Next, we get the retweets, …
dat_rt <- dat_rt_comp %>%
mutate(
## get retweet ids
rt_id = map(id, retweeters, n = 100)
) %>%
unnest(rt_id)
… take a glimpse at the result, …
dat_rt %>% glimpse()
## Observations: 563
## Variables: 18
## Groups: handle [6]
## $ handle <chr> "sebastiankurz", "sebastiankurz", "sebastiankurz",…
## $ text <chr> "Ich möchte mein tiefempfundenes Mitgefühl der Fam…
## $ favorited <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, F…
## $ favoriteCount <dbl> 469, 469, 469, 469, 469, 469, 469, 469, 469, 469, …
## $ replyToSN <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
## $ created <dttm> 2019-02-07 09:45:18, 2019-02-07 09:45:18, 2019-02…
## $ truncated <lgl> TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TR…
## $ replyToSID <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
## $ id <chr> "1093445589512146944", "1093445589512146944", "109…
## $ replyToUID <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
## $ statusSource <chr> "<a href=\"http://twitter.com/download/iphone\" re…
## $ screenName <chr> "sebastiankurz", "sebastiankurz", "sebastiankurz",…
## $ retweetCount <dbl> 101, 101, 101, 101, 101, 101, 101, 101, 101, 101, …
## $ isRetweet <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, F…
## $ retweeted <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, F…
## $ longitude <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
## $ latitude <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
## $ rt_id <chr> "1063042812269289472", "880200336232919040", "1056…
… and then look up the users.
## we use lookupUsers() to reduce the API call load
dat_rt_user <- lookupUsers(dat_rt$rt_id) %>%
twListToDF() %>%
as_tibble() %>%
dplyr::rename(rt_id = id)
By sending their screen names to Botometer we get their human/bot-scores. Note: this will probably take some time.
## create botcheck-wrapper to avoid NULL-return
botcheck_save <- function(x){
res <- botcheck(x)
ifelse(is.double(res),
res,
NA_real_)
}
dat_rt_bot <- dat_rt_user %>%
mutate(human_bot = map_dbl(screenName, botcheck_save))
Let’s have a look at the result.
dat_rt_bot %>%
select(screenName, name, location, human_bot)
## # A tibble: 458 x 4
## screenName name location human_bot
## <chr> <chr> <chr> <dbl>
## 1 PolitikerAT Politiker Vergleich Wien, Österreich 0.918
## 2 CrusilleauL Matteo Salvini LEGA MILAN / ROME ( ITALIE) 0.194
## 3 BUNDgeg_Hass #KATHOLISCH "" 0.0859
## 4 AngronIsAngry AngronIsAngry Not Here 0.0327
## 5 EugenPlesz Eugen Plesz Berlin, Deutschland 0.253
## 6 TenevaNina Nina Teneva Bulgaria 0.253
## 7 Omi1937 OmiG London 0.117
## 8 MehDem7 MehdiDeğirmenci Istanbul, Türkei 0.0492
## 9 osthollandia osthollandia "" 0.0734
## 10 juzzl2 17. Tokio 0.157
## # … with 448 more rows
Finally, we combine our data.
(dat_quest1 <- polit_info %>%
full_join(dat_rt %>%
select(handle, rt_id)) %>%
full_join(dat_rt_bot %>%
select(rt_id, human_bot)))
## Joining, by = "handle"
## Joining, by = "rt_id"
## # A tibble: 563 x 5
## handle name col rt_id human_bot
## <chr> <fct> <chr> <chr> <dbl>
## 1 sebastiankurz Sebastian Kurz #63C3D0 1063042812269289472 NA
## 2 sebastiankurz Sebastian Kurz #63C3D0 880200336232919040 0.918
## 3 sebastiankurz Sebastian Kurz #63C3D0 1056942381457723392 0.194
## 4 sebastiankurz Sebastian Kurz #63C3D0 709677999990378496 0.0859
## 5 sebastiankurz Sebastian Kurz #63C3D0 1695839551 0.0327
## 6 sebastiankurz Sebastian Kurz #63C3D0 1027875113813848064 0.253
## 7 sebastiankurz Sebastian Kurz #63C3D0 958411854132469760 0.253
## 8 sebastiankurz Sebastian Kurz #63C3D0 471043083 0.117
## 9 sebastiankurz Sebastian Kurz #63C3D0 3303063784 0.0492
## 10 sebastiankurz Sebastian Kurz #63C3D0 990242146174332928 0.0734
## # … with 553 more rows
Before we look at the data we got, let’s check how much we did not.
dat_quest1 %>%
group_by(name) %>%
summarize(
`% missing` = round(mean(is.na(human_bot)*100), digits = 1)
)
## # A tibble: 6 x 2
## name `% missing`
## <fct> <dbl>
## 1 Sebastian Kurz 18.1
## 2 HC Strache 26.4
## 3 Thomas Drozda 11.7
## 4 A. Van der Bellen 11.8
## 5 Beate Meinl-Reisinger 12.6
## 6 Peter Pilz 4.2
Hm, missing values are probably mostly due to private accounts, which cannot be retrieved by lookupUsers() if you are not a friend of the particular account. Overall the missing rate is what I would expect in a random sample of Twitter users, except for the retweeters of HC Strache. Let’s check whether this deviation is explainable by chance.
dat_quest1 %>%
mutate(
missing = is.na(human_bot)
) %$%
chisq.test(name, missing)
##
## Pearson's Chi-squared test
##
## data: name and missing
## X-squared = 21.468, df = 5, p-value = 0.0006607
With p = 6.607411610^{-4} chance seems not very likely. The high missing rate might be a reaction to the exposure of several FPÖ operatives (but better call them isolated cases) for tweeting or posting racist and rabble-rousing rubbish. Changing the account to private and sticking with one’s kind is probably good protection against further revelation to the public. Of course, this is just a guess, and maybe there is another reason for the large number of friends-only accounts. Anyways, I doubt that the hidden accounts are bots because that would decrease their effectiveness. So for the topic today - bot or human - they are probably no problem.
OK, next we check who (bot or human) retweeted the tweets of our politicians.
dat_quest1 %>%
group_by(name) %>%
summarize(
mean = mean(human_bot, na.rm = T),
`% > 2.5` = mean(human_bot > 2.5, na.rm = T)*100
)
## # A tibble: 6 x 3
## name mean `% > 2.5`
## <fct> <dbl> <dbl>
## 1 Sebastian Kurz 0.200 0
## 2 HC Strache 0.269 0
## 3 Thomas Drozda 0.266 0
## 4 A. Van der Bellen 0.276 0
## 5 Beate Meinl-Reisinger 0.231 0
## 6 Peter Pilz 0.223 0
Seeing the results, it seems highly likely that the twitter-supporters are all humans. The mean human/bot-score is very low in general, and not a single retweeter scored higher than 2.5
Do the politicians support bots?
Next, we check the human/bot-score of the accounts that got retweeted by the politicians.
Let’s start by comparing the rt-rates in the data-set.
dat_tweets %>%
group_by(handle) %>%
summarize(
`# of rt` = sum(isRetweet, na.rm = T),
`% rt` = round(mean(isRetweet, na.rm = T)*100, digits = 1)
)
## # A tibble: 6 x 3
## handle `# of rt` `% rt`
## <chr> <int> <dbl>
## 1 BMeinl 666 56.8
## 2 HCStracheFP 93 2.9
## 3 Peter_Pilz 1548 60.7
## 4 sebastiankurz 1298 46.4
## 5 thomasdrozda 283 20.3
## 6 vanderbellen 332 16.8
OK, the rt-rate differs quite tremendously between the politicians. While chancellor Kurz, Ms Meinl-Reisinger, and Mr Pilz retweet a lot, Mr. Strache mainly tweets original content.
Before we look at the human/bot-scores of the accounts that got retweets from politicians we have to prepare our data a little. We start by limiting our data to retweets, extracting the user-name of the original account, and merging the data with polit_info.
dat_retweets <- dat_tweets %>%
filter(isRetweet) %>%
mutate(
rt_screenname = str_replace(text,
"RT @([[:alnum:]_]+):\\s[[:alpha:][:print:][:control:]]*",
"\\1")
) %>%
full_join(polit_info)
## Joining, by = "handle"
Next, we extract all retweeters, fill in the already known human/bot-scores, and retrieve the still missing human/bot-scores.
Note: We use the fact that there is some reciprocity when it comes to retweets and fill in the human/bot-scores we had already retrieved when we were checking who retweeted the tweets of the politicians. This saves us some time because Botometer is not too fast and furthermore has a daily limit of 2000 checks. Still, we need to send a large number of requests to Botometer, so this will take time.
dat_retweeters <- dat_retweets %>%
select(rt_screenname) %>%
distinct() %>%
left_join(dat_rt_bot %>%
select(screenName,human_bot),
by = c("rt_screenname" = "screenName")
) %>%
mutate(
human_bot = map2_dbl(human_bot, rt_screenname,
function(x,y) ifelse(
is.na(x),
botcheck_save(y),
x
))
)
Let’s get a quick impression on the overall humanness.
dat_retweeters %>%
ggplot(aes(x=human_bot)) +
geom_density() +
scale_x_continuous(name="human/bot-score") +
theme_classic()
OK, if it was just for the answer and not for the way to get it, then we could stop right now. The human/bot-score ranges from 0 to 5, but we see not a single value larger than 1. In fact, most are around 0.1. Without any further analyses, we can conclude that the politicians did not retweet tweets of bots. However, this is no data-journalism blog, but one about data-science, so we finish the job to learn how it would be done.
So let’s try to get a more differentiated picture than just an overall humanness-graph. For that, we need to combine dat_retweets with dat_retweeters.
dat_quest2 <- dat_retweets %>%
left_join(dat_retweeters)
## Joining, by = "rt_screenname"
With the combined data-set we can create a diversified graph.
dat_quest2 %>%
ggplot(aes(x=name, y = human_bot, fill=name)) +
geom_boxplot() +
scale_x_discrete(name="politician") +
scale_y_continuous(name="human/bot-score")+
scale_fill_manual(values = polit_info$col) +
theme_classic() +
theme(
## remove legend
legend.position = "none",
## rotate names
axis.text.x = element_text(angle = 30,
vjust = 1,
hjust = 1)
)
If there were data spread over the whole range of the human/bot-score, then another visualization option would be to cut into data-segments with e.g.., cut(human_bot, 0:5) and illustrate their proportions per politician as stacked bar graphs. With our data, however, this is pointless. Instead, we try out something else and inspect the effect of the date.
dat_quest2 %>%
ggplot(aes(x=created, y = human_bot, color=name, fill=name)) +
geom_smooth() +
scale_x_datetime(name="date",
limits = c(as.POSIXct("2017-01-01"), NA)) +
scale_y_continuous(name="human/bot-score") +
scale_color_manual(values = polit_info$col) +
scale_fill_manual(values = polit_info$col) +
theme_classic() +
theme(
legend.title = element_blank()
)
Next, we could add annotations to the plot to highlight election dates and other interesting dates, but since we have already concluded that there are no bots involved, we will end now without doing so.
Summary
Overall, bots do not seem to play a direct role in the retweeting behavior of the selected politicians. Regardless of the political party, the selected politicians neither retweeted bots nor got retweeted by them. Honestly, I would have been surprised if such an obvious association could have been found, but on the other hand, I have seen stranger things. Of course, this does not mean that Twitter bots do not play a role in Austrian politics, just that the retweeting behavior of the selected politicians is not affected by bots.
Closing Remarks
I hope you have enjoyed our little digression into evaluating whether a tweeter is a human or a bot. In this case, we have not found any bots, but that does not mean that they are not out there. What I am interested next is whether people with different political affinities differ in their likelihood of following bots. This should not be too hard. One could take a sample of the follower groups and check who else is followed by them. So if you want to check out Botometer yourself, then this might be something you can try to find out. If you do, then please let me know about the results.
If something is not working as outlined here, please check the package versions you are using. The system I used was:
sessionInfo()
## R version 3.5.2 (2018-12-20)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 18.04.2 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=de_AT.UTF-8 LC_COLLATE=en_US.UTF-8
## [5] LC_MONETARY=de_AT.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=de_AT.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=de_AT.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] bindrcpp_0.2.2 RJSONIO_1.3-1.1 xml2_1.2.0
## [4] httr_1.4.0 botcheck_0.0.0.9000 twitteR_1.1.9
## [7] magrittr_1.5 forcats_0.3.0 stringr_1.4.0
## [10] dplyr_0.7.8 purrr_0.3.0 readr_1.3.1
## [13] tidyr_0.8.2 tibble_2.0.1 ggplot2_3.1.0
## [16] tidyverse_1.2.1
##
## loaded via a namespace (and not attached):
## [1] tidyselect_0.2.5 xfun_0.4 haven_2.0.0 lattice_0.20-38
## [5] colorspace_1.4-0 generics_0.0.2 htmltools_0.3.6 yaml_2.2.0
## [9] rlang_0.3.1 pillar_1.3.1 DBI_1.0.0 glue_1.3.0
## [13] withr_2.1.2 bit64_0.9-7 modelr_0.1.3 readxl_1.2.0
## [17] bindr_0.1.1 plyr_1.8.4 munsell_0.5.0 gtable_0.2.0
## [21] cellranger_1.1.0 rvest_0.3.2 evaluate_0.13 knitr_1.21
## [25] curl_3.3 broom_0.5.1 Rcpp_1.0.0 openssl_1.2.1
## [29] scales_1.0.0 backports_1.1.3 jsonlite_1.6 bit_1.1-14
## [33] askpass_1.1 rjson_0.2.20 hms_0.4.2 digest_0.6.18
## [37] stringi_1.2.4 grid_3.5.2 cli_1.0.1 tools_3.5.2
## [41] lazyeval_0.2.1 crayon_1.3.4 pkgconfig_2.0.2 lubridate_1.7.4
## [45] assertthat_0.2.0 rmarkdown_1.11 rstudioapi_0.9.0 R6_2.3.0
## [49] nlme_3.1-137 compiler_3.5.2



Hi Bernhard,
ReplyDeleteExcellent post, thank you for sharing it!
While reading it, I was wondering what's the scale for the botometer scores. I striked me that none of the scores for one of your analysis is higher than 1 (so is in my own analysis). So I cross-checked scores from the botcheck package with the ones from botometer website for several users.
It seems that although the original botometer scores are between 1-5, botcheck scale is between 0-1. For example, my own account @gabrielaczarnek has 4.1 score on botometer score but 0.82 through botcheck (I think it tells more about my tweeting style rather than the botometer model :)).
Anyway, the question is, if I am correct, why you got scores higher than 1 in your first analysis of followers?
Hi Gabcza,
Deleteyou are absolutely right and I should have noticed my self that the scores are scaled to [0,1]. This makes my Twitter-Account quite bot-like.
I just skimmed over my results and cannot find any value greater than 1.0. Could you please give a more precise pointer so that I can look into the matter.
THX!
Hi again,
Deletere scores > 1, I think I misunderstood your sentence: "The mean human/bot-score is very low in general, and not a single retweeter scored higher than 2.5".
You probably meant "noone had scores higher than a midpoint of a 1-5 scale" whereas I got it "the highest (observed) score was 2.5". My bad!