This is a methodological post on some social network analysis work we are developing for Moving Media. The premise for the SNA research is reasonably simple:
Task: Perform social network analysis around the Twitter conversations about the FDA’s proposed health apps guidelines, posted July 19th 2011:
Public brief: http://www.fda.gov/forconsumers/consumerupdates/ucm263332.htm
Comments and Submissions: http://www.regulations.gov/#!docketBrowser;rpp=25;po=0;dct=PS;D=FDA-2011-D-0530;refD=FDA-2011-D-0530-0001
Aim: To map the dispersed network of actors discussing the FDA policy consultation process in social media channels, visualising their relative influence and communicative relationships.
After some initial Twitter research, we found the #FDAApps hashtag to be the conversation we wanted to analyse. The only drawback is that this conversation seems to be unreachable – the Twitter API didn’t return anything although the conversation is there. Any suggestions on this would be appreciated. Following on from this I did a search across four conversations: #mhealth, #healthapps, #FDA, #apps. It is an experiment in both the methodology and the content.
Here’s the breakdown on the process (and it gets a bit nerdy from here):
1. I tracked four Twitter conversations (#mhealth, #healthapps, #apps & #FDA) and processed the data through the Twitter API, Open Refine and then into Gephi. I imported the .csv file into Open Refine to extract the @replies and the #hashtag conversations – a process of deleting much of the data and producing a .csv file Gephi likes. I then imported the data into Gephi, ran a Force Atlas and Frutcherman Reingold layout and ranked the labels by degree. I then played with the statistics slightly by running a Network Diameter across the network (Average Path length: 1.0508474576271187, Number of shortest paths: 236), which enabled my to colour the labels via their betweenness centrality on a scale of 0 – 6, Eccentricity 0-2 and closeness centrality 0 – 1.5. I then ran a modularity stat across it (Modularity: 0.790, Modularity with resolution: 0.790, Number of Communities: 18). 18 communities!
2. I did this for each set of data, that is #mhealth, #mobilehealth, #healthapps, #apps and #FDA. Each process provided a visualisation that demonstrates the key conversation hashtags and the most significant people in those conversations. Here’s the preliminary analysis:
3. I then combined the cleaned data of the four conversations together to create a ‘super set’ to understand the broader ecology of the policy discussion around mhealth and health apps.
The combined conversation around healthapps, mhealth, apps and FDA
Preliminary analysis: What we know (and this is my first critical analysis of this process – it could change as I become more aware of what is going on here):
- The conversation between the FDA and healthapps is stronger than the other two topics due to its location in the network
- @Vanessa_Cacere is the most prominent twitter user in #apps (she often retweets our tweets too!)
- @referralIMD is prominent in #mhealth
- @MaverickNY is prominent in #healthapps
- The bluer the colour of the actor, the closer they are to the topic – ‘closeness centrality’
- @Paul_Sonnier [https://twitter.com/Paul_Sonnier] is extremely significant in the overall conversation – ‘betweenness centrality’
- There are some other probably other significant terms here like #digitalhealth, #breakout, #telehealth, #telemedicine
- It sucks some CPU processing power
- The healthapps viz did not work so well, and I’m not sure why.
The limitations as of now:
- This isn’t the #FDAApps conversation from July 2011 on, this is the mhealth conversation of the 28 May 2013
- I’m not entirely sure it’s possible to construct an archive from events past – I need to look into this further
- I think I can code a program that pings the Twitter API automatically every 20 seconds and then automatically adds it to the dataset. If I can build this, we can start tracking data from now on issues/conversations we think are important. I am manually doing this now, but it is really laborious.
- There is conversations around #apps in general here too. A proper analysis will likely need to clean the raw data further to eliminate any inaccuracies of the representation
Any input on this process would be greatly appreciated and if you have any insights on the findings, please comment below.