7 November 2013
Pre-conference Workshop on Ethics in Online Research, organised by Nele Heise.
Topics discussed where values, responsibility and quality in online research, as well as standards vs. experiences.
The workshop participants also considered ethical review boards common in GB and US universities and dicussed whether and how can these can be adapted to German-speaking countries? What would this imply for journal editors?
8 November 2013
Keynote Jürgen Pfeffer (Carnegie Mellon University) Big Data, Big Research?
Jürgen Pfeffer’s research focuses on the computational analysis of organisations and socities, with a special emphasis on large-scale systems, methodological and algorithmic challenges, nework analysis and theories. He also considers the challenges in analysing large-scale systems.
The focus of his work are socio-cultural systems, in particular society (rather than the internet: “mir ist Internet egal“). By analysing patterns of individual and group behaviours, it is possible to gain new perspctive on collective human behaviour (e.g. Lazer et al 2009, Computational Social Science).
Big data research is very different to traditional social science research, and a range of issues, such as methodological ones, need to be considered. In his keynote, he presents some of the opportunities and constraints big data research offers for computer supported social science.
Social media offers us the opportuntiy to observe human behaviour and interaction in real time (e.g. Golder and Macy, (2012) Social Science with Social Media). An analysis of social media reactions can contribute substantially to understanding visitation patterns in online media.
Big Data Principles
- Collect all (available) data: no data are rejected, messy data and bad data are accepted too. In addition there is no sampling (N=all). This leads to thousands of („independent“) variables – only later do researchers decide what is useful and what not (which explains why secret services collect everything!).
- It is a data driven research process: methods, data, analysis, results first – then the research problem is described. This is not only the crux in big data research, but also changes the typical social science research process.
- It leads to an analysis of correlation rather than cause: Social science asks “why?”, whilst big data analsyis is about collecting all the vatriables and then deciding which predictors are „good enough and the „goodnesss of fit“. The “why?” question is no longer the issue, rather big data research is about trying to predict what factors / indicators need to be considered.
- Many variables, leading to statistical issues: something always correlates; you always get results and they will always be significant (whatever it means!).
- N=all (Is it? All of what? Is all what we want? Is all what we think it is?)
- Questions regarding multi-level bias: do people online represent society? Do people behave online in the same way as offline? Do the collected data represent human behaviour?
But when we consider and interpret the results obtained, it is important to not be overwhelmed by the data produced and to carefully consdier whether we are analysing society or a software implementation?!
( Jürgen Pfeffer is also co-author of the book Studying Social Networks: A Guide to Empirical Research, 2012)
Panel: Best Papers
Jungnickel and Maireder consider multiple-method design for researching content and links on facebook. The other two best papers presented in this panel also focus on problems associated with online research. Examples presented in the first paper are filter bubbles, social relevance (in search engines), results based on who is searching and the longtail effect (Emmer and Strippel). Schumann and Hildebrandt consider the importance of social shares and their influence on the ranking of websites (in Google).
Panel Speech Analysis (Sprachanalysen)
This panel focuses on the methodological challenges in accessing and analysing data.
Einspanner, Thimm, Anastasiadis, Burger present an analysis of politcal topics discussed online (in particular twitter) during the Landtag elections in Germany 2010 – 2013. Using an interdisciplinary approach (computer science and communication studies) Eble and Stein look at the options and methodological challenges in the development of tools for the analyis of (recorded) speech and online communication. Access, decentrality, volatility and volume of the data are methodological challenges that need to be addressed. Krenn and Wetschanow also adopt an interdisciplinary approach (linguistics and computer linguistics) in the development of tools to be used for automatic content analysis and gender identificaton of text.
Panel Theoretical Perspectives (Theoretische Perspektiven)
Mahrt presents Big Data in the context of and for development of theory (given the current asummption that “the data deluge makes the scientific method obsolete”). Some problems discussed are the validity of the measures and sampling (studies often focus on one type of social media only). But one of the main problems with Big Data is that we consider the data first, then the theory (see also Mahrt and Scharkow (2013) The Value of Big Data in Digital Media Research Journal of Broadcasting and Electronic Media 57(1) p. 20-33).
Heise considers the ethical issues in big data research. Which implies that we need to first consider what research ethics are (e.g. Fenner 2010), and then combine these with technical/methodological challenges and conducting research. There are 3 ethical dimensions that can be considered: the researcher, the user and combinations of these. Big data confronts us with ethical issues, as big data represents “massive quantities of information produced by and about people (….)” (boyd and Crawford, 2011).
9 November 2013
Panel Interpersonal Communication (Interpersonale Kommunikation)
The first presentation (Becker, Benger, Fanzke, Jöckel & Merkel) looks at Dunbar’s number in the context of Facebook. The authors look at young people’s use of Facebook: given an increasing number of friends, are these friends real friends and how are these “managed”. Their study is based on Dunbar’s assumption about the number of friends (based/limted on the size of the neo-cortex) and digital friendship grooming. Results show that family, school and hobbies play an important role on the type of friends and the importance attributed to these online friendships.
Sattelberger & Seufert look at whether social network analysis can help predict the number of visitors (in this case, cinema audiences during the period July – December 2012). Methods used where cluster analysis, discriminant analysis and latent growth analysis. The study also looked at social media as a possible impact factor on the popularity of the film.
Pentzold considers ethics in online research, in particular at Wikipedia user’s reactions to being “studied”. What are the expectations held by researchers, research institutions and what are the implications for conducting online research?
Haake looks at the use of framing analysis when researching the social web. What are the indicators for frames, and how can these be operationalised?Qualitative analysis of media content can provide important indicators for frames – but in future, framing analysis tools need to be adapted for use in the social web, e.g. by using different indicators, re-defining problems, and by considering the ethical dimension.
Baden & Springer looked at journalism 2.0 using framing analysis. Frames represent different opinions, and they analysed whether 2-3 opinions (frames) dominate a certain issue in the news? Users (online news readers) either report repertoires (i.e. stay within the same topic), elaborate and interpret repertoires. Only seldom are new repertoires developed by the users.
Panel Use and Measurement of Use (Nutzung & Nutzungsmessung)
Jürgens, Stark & Magin look at algorithms and personalisation effects in the context of the “filter bubble”. Communication and social exchange is increasinlgy occuring online (“platformed sociality”, van Dijck, 2013 and “appliacization” Zittrain, 2008) and there are new intermediaries (Google,Facebook, etc.). This leads to new types of influences on groups of users (gatekeeping) but also on individual users (differentiation). But indiviudal differentiation means that the intermediaries are no longer “neutral” and raises issues about the legal regulation of the intermediaries (van Dijk 2012).
Niemann presents the advantage/disadvantages ofusing experience sampling as a method to research the privacy paradox (users are worried about their privacy whilst at the same time yet share their personal information online) in adolescents and young adults.
Strippel & Emmer discuss the suitability of logfile anaylsis as a tool to measure online use. Logfiles are originally a purely technical measurement (text data) that represent an automatic protocol. Logfile analysis is a “classic” tool in online research, but in comparison to other research methods (content anlaysis, questionnaires, experiments, group discussions) it isn’t used very much. Advantages are that it is automatic, cheap, non-reactive; its disadvantages are that it is limited, anonymous and imprecise. For online measurement it is ideal for client logfiles and proxy-logfiles but not for server logfiles. The authors suggest combining proxy logfiles with other methods (e.g. questionnaires).