Category Archives: Data Visualisations

Original image by Dean Terry

Improving the Social Network Analysis Methodology

We have been collecting data for just shy of a year now and have been developing the Twitter social network analysis methodology for a little longer than that. As you might recall, we have been following the Mobile Health conversation via the #mHealth conversation and have finalised the collection of those data. The processing is almost finished and we can now progress to the next stage of ethnography to further understand what we have collected.

We have been improving the methodology as we go and the last time we received some assistance was to write some code for the Gephi program. Recently, we have been talking with colleagues from the University of Wollongong’s SMART Infrastructure faculty, who have been developing the collection process of Twitter data. Their project is related to flooding information in Indonesia (CogniCity), however Tom Holderness has been kind enough to share his work on GitHub.

When we install this JavaScript, which uses NodeJS, we will have an automated version of the manual process we have been struggling with for the past year. Further, the code is customisable so the researcher can  query the Twitter Stream API for the specific data they require. You can read more about the CogniCity NodeJS application on GitHub.

If we can improve the processing speed further, we will have a research prototype that can be shared with other researchers who are interested in Twitter social network analysis – hopefully a post soonish will reveal this!

Ingrid Mason and Luc Small delivering the closing address of THATcamp Sydney 2013

#THATcamp Sydney, October 2013 Review

We recently presented some of our social network analysis research of the informal policy actors within the mobile health regulatory space at the THATcamp Sydney unconference. Unfortunately I missed the first day and Fiona could only attend the first session of that day, but we managed to see a full (half) day on the second.

Amongst other great projects I observed, one that certainly stood out for me was the Australian’s Women’s Register. In conjunction with the University of Melbourne, they are doing some outstanding work on collecting and analysing data about Australian women – highly recommend checking this out.

Fiona and I then conducted our session which we took as an opportunity to talk about our work so far and use expertise in the room to interrogate and develop our methodology. It was an amazing experience and you can hear the audio from our talks here.

Some of the points that emerged from the discussion include:

  • Who are the official organisations that are interacting in these conversations?
  • There is existing research to suggest that participation is driving policy
  • Debra Lupton at the University of Sydney to explore data and politics
  • Nick  Thurburger University of Melbourne, experience and developing methodology in converting Excel sheets to data clean
  • Thresholds of data – what to use, who are they, where are they, rural voices etc – we need to think about our own thresholds which shape the data analysis
  • Demographic profiling in the process – this is critical to understand the ‘why’ of the interactions between the actors
  • Content coding around the data collection, who do we want to hear from?
  • Jake Wallace Charles Stuart university, experience in political party process
  • Policy analysis portal – I am thinking we need to develop a tool similar to this to embed in our site – that is users can bring their data to it and run their own analysis
  • Positive and negative sentiment in tweets – UWS are working in this space
  • Government institutions are legally required to collect social media conversations – interesting!
  • Internal organisational cultural behaviours influencing what is said  and what is not said via freedom of information act
  • If so, could we access Yammer data?
  • Digital engagement as opposed to social media
  • Atlas of living Australia - great site to play around with in representing data
  • Internet archive to send or harvest links – credibility of user data in government policy
  • Suirveillance in terms of peaks of use, who is using soc med and when
  • twitter api – this is the ‘fat tube’ of Twitter data and Fiona is working on a collaborative approach with UTS
  • Steve Cassidy at Uni of Macquarie – workflow people

And if that’s not enough, the wonderful Yvonne Perkins kept a Google doc of the session which can be viewed online.

Great session and i think it is a fantastic primer for our paper which we will present at the aaDH (Australasian Association of the Digital Humanities) conference in 2014 – thanks for the input THATcamp!


Improved Gephi processing through Java RAM allocation – downloadable

Recently, our social network analysis methodology hit a snag as the computer I am using started to crash when attempting to process our larger data sets. The data sets are not extremely large at this stage (approx 8MB Excel sheets with about 80 000 lines of text), but nonetheless too big for my MacBook Pro to handle. Just to remind you, we are using Gephi as our analytics software (open source)

I started looking into virtual servers where Amazon EC2 Virtual Servers are the benchmark in this domain. They seem to be located in Northern America, i.e. San Francisco, and I have been advised the geographical location of Amazon is good when scraping data from technology companies like Twitter and Facebook, who also host their data in a similar geographical area. However, Amazon does appear to be a little too expensive for the research budget – although very tempting to wind some servers up to collect and process our data quickly.

The second option was to lean on the national super computer infrastructure for Australian researchers, NeCTAR. I established two medium virtual servers (2 vCPU, 8GB RAM, 60GB local VM disk), installed a Ubuntu operating system, but had difficulty in talking with the system (happy to take input from anyone here).

Then, we had a meeting with Information and Communication Technology (ICT) people at the University of Sydney who have been very helpful in their approach. We have been liaising with Justin Chang who provided us with an improved version of Gephi that essentially enables us to use more RAM on my local machine to process the data sets. Justin provided me with a disk image that I installed, tested and was able to get moving with the analysis again.

I asked if I could share the Gephi with our readers, to which he agreed – and provided a step by step on how he created an improved RAM allocated version of Gephi:

- Download the ‘Gephi’ .dmg frill from:

- Open the .dmg file

- Copy the file to a folder on your desktop

- Ctrl + Click the file and click Show Package Contents

- Navigate Contents  > Resources > Gephi > etc and open the gephi.conf file in a text editor

- Change the maximum Java RAM allocation:


default_options=”–branding gephi -J-Xms64m -J-Xmx512m -J-Xverify:none -J-Dsun.java2d.noddraw=true -J-Dsun.awt.noerasebackground=true -J-Dnetbeans.indexing.noFileRefresh=true -J-Dplugin.manager.check.interval=EVERY_DAY”


default_options=”–branding gephi -J-Xms1024m -J-Xmx2048m -J-Xverify:none -J-Dsun.java2d.noddraw=true -J-Dsun.awt.noerasebackground=true -J-Dnetbeans.indexing.noFileRefresh=true -J-Dplugin.manager.check.interval=EVERY_DAY”

This enables Gephi to utilise up to 2GB RAM when processing data, you can allocate any amount of RAM here (as long as it is less than your systems RAM resources)

- save the file

- run the application ‘Disc Utility’

- from within Disc Utility click file > new > Disk Image from Folder and select the folder that you created on the desktop and then click Image.

You can download the DMG with the two versions of Gephi (1GB and 2GB).

The Mobile Internet Policy Global Media Policy Section, so far…

On our other site where we are collectively contributing policy documents to the Global Media Policy website, specifically our Mobile Internet Policy section, we are starting to make some progress. There has been significant developments in the data we have collected, with each f the project’s research streams now in full swing. What this means is we have found an ever-increasing amount of policy actors relevant to mobile internet policy and have located these within a global context – inclusive of a broader set of internet governance actors. These will of course continue to develop in the coming months.

In the meantime, it is useful to visualise our work to date. Below is the sunburst representation of the policy actors involved in mobile media policy. We would encourage you to look through this static graphic here, but to also follow this through the Global Media Policy website to examine its broader position of global internet governance.


The Mobile Internet Policy section of the Global Media Policy cohort

The Mobile Internet Policy section of the Global Media Policy cohort

Week 1 of mapping the mobile health policy actors

My inner geek is tingling this morning. After a pretty big night on the data, I woke with a visualisation hangover. But the good news is we now know  mobile health data stuff.

I have established some significant methodological approaches this week given that Twitter shut down their Search API on Friday last week just after we had confirmed our approach. Essentially, I had to switch to their new Streaming API which then enabled me to construct an archive on #mHealth, #mobilehealth and #healthapps. In the seven days, we gleaned 7229 #mhealth tweets, 453 #mobilehealth tweets and 277 #healthapps tweets (total 7963 tweets). This is automatically scrapping the Twitter API now and we continue to collect the data (which is great because I just found out Gephi has a timeline function so we can track this conversation and then animate it).

So this is what the larger dataset looks like:


I then started drilling into the statistical make up of this conversation. It emerged there are 1463 communities conversing around these topics. The next graphic is really useful because we can see who the lead users are and the networks they influence (this is the cool bit)


What we can see here is the lead influencers are: @PhilippeLoizon, @Paul_Sonnier, @EricTopol and @Saif_Abed by quite a significant amount. If we drill a little further we can see the top twenty influencers of the 1463 communities are: @StefanieMastny, @RarusRarus, @JessWa21, @mobilehealth, @HealthcarePays, @NewsForToday1, @sound_wordz, @Ustabilize, @sandraproulx, @ideagreenhousnh, @pttalk, @Techlog, @bkalis, @Brian_Eastwood, @laurenstill, @danmunro, @RSpolter. #Kenratt, @Perficient_HC, @HealthStandards.

So the next step to follow is to find out who these people are within the health apps ecology as they are highly influential – well in the Twitter sphere at least.