Data Analysis

The Bodleian Data Library recommends searching SOLO for books on data analysis to find out more about general options and approaches.
Relevant journals can also be useful resources to find additional discussion of analysis, tools as well as examples of research conducted with it.

Concepts such as big data and data mining growing in popularity attract a lot of support from video tutorials on YouTube or lynda.com.

Many of the tools described on this page have similar capabilities so choice is often based upon how commonly they are used within certain disciplines, how well they are documented and how readily available they are in practice.

Statistical analysis

Excel

Microsoft Excel is one of the most commonly used spreadsheet packages using a grid of cells in rows and columns to organise data. It can be used to carry out arithmetic calculations and display data as graphs, histograms and charts.

Access

Training

Nesstar

Nesstar is a software system for online data analysis developed within the national data archiving community and in particular by the Norwegian Social Science Data Service (NSD). As a result the software also features tools to allow sharing and dissemination of data. Nesstar handles survey data and multidimensional tables as well as text resources. Users can search, browse and analyse the data online. It is supported by other archives such as the UK Data Service.

Access

  • Trial licences for Nesstar Server & Webview, plus freeware version of Nesstar Publisher available from Nesstar

R

An open source freely available language and environment for statistical computing and graphics which provides a wide variety of statistical and graphical techniques: linear and nonlinear modelling, statistical tests, time series analysis, classification and clustering. It is a major rival of SPSS and Stata.

Access

Training

SPSS

Statistical Package for the Social Sciences (SPSS) is used widely in business, marketing as well as academia. It is a starting point for most interested in analysis of statistical data and also produced tabulations and graphics. SPSS may also function as a data organisation and limited research documentation tool. It is easy to learn and is often recommended for general use.

Access

Training

Stata

A software package that is an alternative to SPSS and performs similar functions of analysis, modelling and tabulation. It takes longer to learn than SPSS but is more powerful and flexible. In addition it benefits from regular updates of its techniques. A number of versions are available aimed at educational use, large volumes of data, multiprocessor computers as well as the standard version (Stata/IC).

Access

  • STATA is not available for instant access on either Library or MRB-networked PCs, however it is available to eligible students and staff in departments and centres within the Manor Road Building. Each department/centre decides who is eligible, and nominates them to the MRB IT Team who send a username and password and instructions for downloading the software onto their own device. See here for further information about access via MRB.
    Students can also purchase STATA at a reduced cost for their own devices from the supplier Timberlake.

Training

Textual and audio-visual analysis

Textual or mixed method analytical programs are often grouped together under the title of ‘computer assisted qualitative data analysis software’ (CAQDAS). Further information about CAQDAS software can be found on the University of Surrey's CAQDAS page and the University of Huddersfield's Online QDA website.

Atlas.ti

Atlas.ti offers similar tools to analyse and code a wide range of text and audio visual data and it is also useful for geo-spatial data.

Access

Training

Digital Scholar Lab

A cloud-based research environment that allows students and researchers to apply natural language processing tools to raw text data (OCR) from Gale's Primary Sources in a single research platform.

Access

  • Available here to the Bodleian Library readers
  • Requires to register before use.

Training

  • Watch the Webinar to find out how to create a content set, clean text, and run an analysis in the Lab.

MAXQDA

This is an alternative to Nvivo and handles a similar range of data types allowing organisation, colour coding and retrieval of data. Text, audio or video may equally be dealt with by this software package. A range of data visualisation tools are also included.

Access

  • Trial licences available from MAXQDA

NVivo

NVivo is a software package created to deal with qualitative or mixed methods data such as interviews and focus groups. This may either be in the form of text transcriptions or certain types of audio and video recordings. It allows extensive annotation and segmentation of data as part of organisation, categorisation and analysis.

Access

Training

Geospatial tools

ArcGIS

A geographic information system, ArcGIS 10.2 can be used by anyone working with geospatial data or in fact any statistical information that includes geographical variables such as location, elevation, population density and so on. If the information being used features a geographical representation of the world as part of the mix then ArcGIS should be of interest. It can be used to:

  • View maps/mapped information as part of analysis;
  • Compile geographic data;
  • Build and edit maps to help analysis or visualisation;
  • Amend properties and fields in geospatial databases and generally manage such information;
  • Develop projects that draw on the large user base and functionality this package has built up.

It can be used with any geo-spatial data such as the Landscan population database. (Please note: Extra username/password are required for the Landscan database: see Weblearn for details).

Access

Training

Atlas.ti

Can be used to work with Google Earth files: create documents from KML (Keyhole Markup Language) or KMZ files (zipped KML files), which will start Google Earth and fly you to a specified location. Google earth functionality is thus enabled from within ATLAS.ti.

MapInfo

A geographic information system (GIS) popular among entry-level users due to its low cost and ease of use. GIS is software that is designed to store, query, analyse, process, and visualise spatial data.

Training

Geospatial Analysis online

A free online resource, based on the book Geospatial Analysis: a comprehensive guide (5th Edition, 2015 - de Smith, Goodchild, Longley) introduces concepts, methods and tools, provides many examples using a variety of software tools such as ArcGIS, etc. to clarify the concepts discussed. It aims to be comprehensive (but not necessarily exhaustive) in terms of concepts and techniques, representative and independent in terms of software tools, and above all practical in terms of application and implementation.

Online services

Bloomberg Professional

A subscription service that makes available financial information, news, reports, data and analysis. It contains near real-time and historical financial information on individual equities, stock market indices, fixed-income securities, commodities, currencies, and futures for both international and domestic markets. Data can be downloaded into excel. The service features an integrated set of indepth tutorials which should be used and understood by first time users.

Access

EIKON

A financial market intelligence database and a set of financial analysis tools that replaces Thomson Reuters’ previous products ‘Datastream’ and ‘Thomson One’. It provides information on markets, indices, company and economic information and historical financial data via access to trusted, up to the minute and accurate content from more than 5 million securities world-wide. Coverage includes pricing data, research, fundamentals, financial estimates, news, and charts.

Access

  • By a dedicated PC in the Social Science Library, and open to current University staff and researchers (blue card holders) only.

GESIS: MISSY (Microdata Information System)

Part of the service infrastructure of the German Microdata Lab, MISSY is an online service platform that provides structured metadata for official statistics. It includes metadata at the study and variable level as well as reports and tools for data handling and analysis. All documentation in MISSY refers to EU and national (German microcensus) microdata available for scientific purposes.

For EU-LFS microdata users MISSY offers SPSS- and STATA routines, which transfer the EU-LFS 1999-2016 ad hoc csv-files to SPSS/Stata data files. Latest update are available here.

Social Data Science Lab

An ESRC Data Investment, part of the Big Data Network for the social sciences brings together crime, social, computer, and statistical scientists to study the empirical, methodological, theoretical and technical dimensions of New and Emerging Forms of Data in social, policy and business contexts. This empirical social data science programme is complemented by a focus on ethics and the development of new methodological tools and technical solutions for the UK academic, public and private sectors.

The Lab develops and supports the COSMOS Open Data Analytics software, that provides ethical access to social media data for social science researchers.

UKDS.Stat

A browser based tool recently developed by the UK Data Service for exploration of a number of its key macro data collections. It is an attempt to integrate analysis and visualisation with the point of data access.

Access

Guides

Data analysis & visualisation tools

ArcGIS

A geographic information system (GIS) that helps to explore highly accurate geospatial data; you can create maps, analyze data for land use studies and other reports, and prepare data for use in an application or database.

Training

Blender

This free and open source 3D creation suite supports the entirety of the 3D pipeline — modeling, rigging, animation, simulation, rendering, compositing and motion tracking, in the context of research data in particular.

Access

  • The suite is free to download from the website.

Training

ITLC offers either

Datawrapper

An online data-visualization tool for making interactive charts which are responsive and embeddable in a website.

DocNow

A tool and a community developed around supporting the ethical collection, use, and preservation of social media content.

QGIS

A cross-platform, free and open-source desktop geographic information system (GIS) application.

Training

  • An online course through Lynda.com.

R and Shiny

R is a tool used for data analysis and visualisation.
Using the free Shiny package, these analyses and visualisations can be published as interactive webpages just using R.

Training

Social Explorer

A suite of online tools and data that allow users to visually explore hundreds of thousands of data indicators across demography, economy, health, religion, crime and more. Users can visualize and interact with data, create reports and downloads for offline processing.
Demographic Profiles, a new tool designed to provide users with an overview of the most popular demographic and socio-economic topics for a given geographical and/or administrative area within the United States, helps to explore census data, finding the right facts, to analyse socio-economic data and discover trends, to visualise the data and groups with charts by topic.

Tableau Public

An easy to use, free and powerful tool for creating interactive dashboards and data visualisations that can be shared publicly and embedded in your personal site.

Training

  • Check out a face-to-face course offered by the ITLC.

Visio 2016 (MS)

A diagramming tool that can be used to create diagrams, timelines, org charts, and more.

Training

  • Learn about Visio online on Lynda.com.

Open Source Data Tools

Blender

See information above.

Geospatial Analysis online

See information above.

KNIME

An open resource platform for data analysis, a toolbox containing over two thousand modules, hundreds of ready-to-run examples, a comprehensive range of integrated tools, and a choice of advanced algorithms.

Access

Training

OpenRefine (formerly Google Refine)

A tool for working with messy data: cleaning it; transforming it from one format into another; and extending it with web services and external data.
Currently Google is not actively supporting this project; project development, documentation and promotion is now fully supported by volunteers.

Access

Training

Orange

Open source machine learning and data visualization for novice and expert. Interactive data analysis workflows with a large toolbox.

Access

Training

R

See information above.

To find out more about freely available tools for data analysis in the areas of open source data tools, data visualization tools, sentiment tools, data extraction tools and databases go to Octoparse blog.
The blog also desribes online courses on different MOOC for beginners to learn data science fundamentals, key data science tools, and widely-used programming languages in big data analytics.

Data mining

Rapid Miner

A ready made, open source, 'no-coding required' software which gives advanced analytics. It incorporates multifaceted data mining functions such as data pre-processing, visualisation, predictive analysis.

Access

Training

WEKA

A collection of machine learning algorithms for data mining tasks. The algorithms can either be applied directly to a dataset or called from your own Java code. Weka contains tools for data pre-processing, classification, regression, clustering, association rules, and visualisation.

Access

Training

Orange

Python-based, powerful and open source tool for both novices and experts. It has components for text mining, visual programming, machine learning, add-ons for bioinformatics, data analytics.

Access

Training

R

A free software environment for statistical computing and graphics is one of the leading tools used to do data mining tasks. It  is packaged with hundreds of libraries built specifically for data mining, and comes with community support.

Access

Training

KNIME

It is an open source data analytics, reporting and integration platform which does all three of data pre-processing main components: extraction, transformation and loading. Its GUI allows for the assembly of nodes for data processing and integrates various components for machine learning and data mining.

Access

Training

Rattle

Rattle, expanded to ‘R Analytical Tool To Learn Easily’, has been developed using the R statistical programming language. The software can run on Linux, Mac OS and Windows, and features statistics, clustering, modelling and visualisation with the computing power of R. Rattle is currently being used in business, commercial enterprises and for teaching purposes in Australian and American universities.

Access

Training

Tanagra

A free open source data mining software for academic and research purposes proposes several data mining methods from exploratory data analysis, statistical learning, machine learning and databases area. It contains some supervised learning but also other paradigms such as clustering, factorial analysis, parametric and non parametric statistics, association rule, feature selection and construction algorithms.The main purpose of Tanagra project is to give researchers and students an easy-to-use data mining software, conforming to the present norms of the software development in this domain (especially in the design of its GUI and the way to use it), and allowing to analyse either real or synthetic data.

Access

Training

XLMiner

The only comprehensive data mining add-in for Excel, with neural nets, classification and regression trees, logistic regression, linear regression, Bayes classifier, K-nearest neighbors, discriminant analysis, association rules, clustering, principal components, and more. XLMiner provides everything you need to sample data from many sources — PowerPivot, Microsoft/IBM/Oracle databases, or spreadsheets; explore and visualise your data with multiple linked charts; preprocess and ‘clean’ your data, fit data mining models, and evaluate your models’ predictive power. The drawback of XL Miner is that is paid add in for excel but there is 15 day free trial option. The software has great features and its integration in excel makes life easier.

Access

Training

Data reference management

You can use reference management software to organise data citations:

Endnote

In Endnote use the reference type for "dataset".

Mendeley

In Mendeley use other more generic reference type templates and fill in the essentials for your dataset.

Zotero

In Zotero enter the citation in the system as a "Document," depending upon if/how the data producer provides a recommended citation, either:
  • Export an RIS file and import this file into Zotero
  • Copy and paste the information from a recommended citation into a new Zotero item with the type "Document"
  • Otherwise, use the "Document" item type to add the components of the citation.
Back to top