Text and data mining software

Rapid Miner

Rapid Miner is ready-made, open source, 'no-coding required' software which gives advanced analytics. It incorporates multifaceted data mining functions such as data pre-processing, visualisation, predictive analysis.

Access

Free to download (you will need to register for a new account)

Training

Online video tutorials are available on the Rapid Miner website.

Weka

Weka is a free-to-download collection of machine learning algorithms for data mining tasks. The algorithms can either be applied directly to a dataset or called from your own Java code. Weka contains tools for data pre-processing, classification, regression, clustering, association rules, and visualisation.

Access

Weka is free to download.

Training

View free online courses on data mining with machine learning techniques.

Orange

Orange is a Python-based, powerful and open source tool for both novices and experts. It has components for text mining, visual programming, machine learning, add-ons for bioinformatics, data analytics.

Access

Orange is free to download.

Training

View the YouTube training video.

R

R is a free software environment for statistical computing and graphics is one of the leading tools used to do data mining tasks. It  is packaged with hundreds of libraries built specifically for data mining, and comes with community support.

Access

Free to download.

Training

Training resources available:

KNIME

Knime is an open-source data analytics, reporting and integration platform. It does all three of data pre-processing main components: extraction, transformation and loading. Its GUI allows for the assembly of nodes for data processing and integrates various components for machine learning and data mining.

Access

Free to download (register for help and updates).

Training

Online training resources are available on the Knime website.

Rattle

Rattle, ‘R Analytical Tool To Learn Easily’, has been developed using the R statistical programming language. The software runs on Linux, Mac OS and Windows, and features statistics, clustering, modelling and visualisation with the computing power of R. Rattle is currently being used in business, commercial enterprises and for teaching purposes in Australian and American universities.

Access

Free to download.

Training

 

Tanagra

Tanagra is a free, open-source data mining software for academic and research purposes. It proposes several data mining methods from exploratory data analysis, statistical learning, machine learning and databases area. It contains some supervised learning, but also paradigms such as clustering, factorial analysis, parametric and non parametric statistics, association rule, feature selection and construction algorithms.

The main purpose of Tanagra project is to give researchers and students an easy-to-use data mining software, following the present norms for software development in this domain (especially in the design of its GUI and the way to use it), and allowing to analyse either real or synthetic data.

Access

Free to download.

Training

Guidance is available on the tutorial blog.

XLMiner

XLMiner is the only comprehensive data mining add-in for Excel, with neural nets, classification and regression trees, logistic regression, linear regression, Bayes classifier, K-nearest neighbors, discriminant analysis, association rules, clustering, principal components, and more. XLMiner provides everything you need to sample data from many sources — PowerPivot, Microsoft/IBM/Oracle databases, or spreadsheets. You can explore and visualise your data with multiple linked charts; preprocess and ‘clean’ your data, fit data mining models, and evaluate your models’ predictive power.

The drawback of XL Miner is that it is a paid add-in for Excel, but there is a 15-day free trial option. The software has great features and its integration in Excel makes life easier.

Access

Download a 15-day free trial.

Training