Guest blog post byData Science Girl
Fantastic resource created by Andrea Motosi. I’ve only included the 5 categories that are the most relevant to our audience, though it has 31 categories total, including a few on distributed systems and Hadoop. Click hereto view the 31 categories. You might also want to check our our our internal resources (the first section below).
Source:Machine Learning and Face Recognition Papers
Data Science Central – Resources
- Data sets
- General resources
- Research Articles
- Data Science Cheat Sheet
- Data Science Projects
- Data Science Dictionary
- Our Data Science Book
Machine Learning
- Apache Mahout: machine learning library for Hadoop
- Ayasdi Core: tool for topological data analysis
- brain: Neural networks in JavaScript
- Cloudera Oryx: real-time large-scale machine learning
- Concurrent Pattern: machine learning library for Cascading
- convnetjs: Deep Learning in Javascript. Train Convolutional Neural Networks (or ordinary ones) in your browser
- Decider: Flexible and Extensible Machine Learning in Ruby
- etcML: text classification with machine learning
- Etsy Conjecture: scalable Machine Learning in Scalding
- Google Sibyl: System for Large Scale Machine Learning at Google
- H2O: statistical, machine learning and math runtime for Hadoop
- IBM Watson: cognitive computing system
- MLbase: distributed machine learning libraries for the BDAS stack
- MLPNeuralNet: Fast multilayer perceptron neural network library for iOS and Mac OS X
- nupic: Numenta Platform for Intelligent Computing: a brain-inspired machine intelligence platform, and biologically accurate neural network based on cortical learning algorithms
- PredictionIO: machine learning server buit on Hadoop, Mahout and Cascading
- scikit-learn: scikit-learn: machine learning in Python
- Spark MLlib: a Spark implementation of some common machine learning (ML) functionality
- Sparkling Water: combine H2OÕs Machine Learning capabilities with the power of the Spark platform
- Vahara: Machine learning and natural language processing with Apache Pig
- Viv: global platform that enables developers to plug into and create an intelligent, conversational interface to anything
- Vowpal Wabbit: learning system sponsored by Microsoft and Yahoo!
- WEKA: suite of machine learning software
- Wit: Natural Language for the Internet of Things
- Wolfram Alpha: computational knowledge engine
Visualization
- Arbor: graph visualization library using web workers and jQuery
- CartoDB: open-source or freemium hosting for geospatial databases with powerful front-end editing capabilities and a robust API
- Chart.js: open source HTML5 Charts visualizations
- Crossfilter: avaScript library for exploring large multivariate datasets in the browser. Works well with dc.js and d3.js
- Cubism: JavaScript library for time series visualization
- Cytoscape: JavaScript library for visualizing complex networks
- D3: javaScript library for manipulating documents
- DC.js: Dimensional charting built to work natively with crossfilter rendered using d3.js. Excellent for connecting charts/additional metadata to hover events in D3
- Envisionjs: dynamic HTML5 visualization
- Freeboard: pen source real-time dashboard builder for IOT and other web mashups
- Gephi: An award-winning open-source platform for visualizing and manipulating large graphs and network connections
- Google Charts: simple charting API
- Grafana: graphite dashboard frontend, editor and graph composer
- Graphite: scalable Realtime Graphing
- Highcharts: simple and flexible charting API
- IPython: provides a rich architecture for interactive computing
- Keylines: toolkit for visualizing the networks in your data
- Matplotlib: plotting with Python
- NVD3: chart components for d3.js
- Peity: Progressive SVG bar, line and pie charts
- Plot.ly: Easy-to-use web service that allows for rapid creation of complex charts, from heatmaps to histograms. Upload data to create and style charts with Plotly’s online spreadsheet. Fork others’ plots.
- Recline: simple but powerful library for building data applications in pure Javascript and HTML
- Redash: open-source platform to query and visualize data
- Sigma.js: JavaScript library dedicated to graph drawing
- Vega: a visualization grammar
Graph Databases
- Apache Giraph: implementation of Pregel, based on Hadoop
- Apache Spark Bagel: implementation of Pregel, part of Spark
- ArangoDB: multi model distribuited database
- Facebook TAO: TAO is the distributed data store that is widely used at facebook to store and serve the social graph
- Faunus: Hadoop-based graph analytics engine for analyzing graphs represented across a multi-machine compute cluster
- Google Cayley: open-source graph database
- Google Pregel: graph processing framework
- GraphLab PowerGraph: a core C++ GraphLab API and a collection of high-performance machine learning and data mining toolkits built on top of the GraphLab API
- GraphX: resilient Distributed Graph System on Spark
- Gremlin: graph traversal Language
- InfiniteGraph: distributed graph database
- Infovore: RDF-centric Map/Reduce framework
- Intel GraphBuilder: tools to construct large-scale graphs on top of Hadoop
- MapGraph: Massively Parallel Graph processing on GPUs
- Neo4j: graph database writting entirely in Java
- OrientDB: document and graph database
- Phoebus: framework for large scale graph processing
- Sparksee: scalable high-performance graph database
- Titan: distributed graph database, built over Cassandra
- Twitter FlockDB: distribuited graph database
NewSQL
- Actian Ingres: commercially supported, open-source SQL relational database management system
- BayesDB: statistic oriented SQL database
- Cockroach: Scalable, Geo-Replicated, Transactional Datastore
- Datomic: distributed database designed to enable scalable, flexible and intelligent applications
- FoundationDB: distributed database, inspired byF1
- Google F1: distributed SQL database built on Spanner
- Google Spanner: globally distributed semi-relational database
- H-Store: is an experimental main-memory, parallel database management system that is optimized for on-line transaction processing (OLTP) applications
- HandlerSocket: NoSQL plugin for MySQL/MariaDB
- IBM DB2: object-relational database management system
- InfiniSQL: infinity scalable RDBMS
- MemSQL: in memory SQL database witho optimized columnar storage on flash
- NuoDB: SQL/ACID compliant distributed database
- Oracle Database: object-relational database management system
- Oracle TimesTen in-Memory Database: in-memory, relational database management system with persistence and recoverability
- Pivotal GemFire XD: Low-latency, in-memory, distributed SQL data store. Provides SQL interface to in-memory table data, persistable in HDFS
- SAP HANA: is an in-memory, column-oriented, relational database management system
- SenseiDB: distributed, realtime, semi-structured database
- Sky: database used for flexible, high performance analysis of behavioral data
- SymmetricDS: open source software for both file and database synchronization
- Teradata Database: complete relational database management system
- VoltDB: in-memory NewSQL database
Other
- Big data Ecosystem Table
- Hadoop Ecosystem TablebyJavi Roman
- Awesome Big DatabyOnur Akpolat
- Awesome HadoopbyYoungwoo Kim
Related articles (Internal to DataScienceCentral)
- Data Science Cheat Sheet
- Data science apprenticeship
- Data science certification
- Previous digests
- Data science resources
- Competitions and Challenges
- Salary surveys
- Training
- Data science books
- How to detect spurious correlations, and how to find the real ones
- Data science job ads that do not attract candidates, versus those that do
- Data Science and Analytics Jobs
- Hadoop resources
- 17 short tutorials all data scientists should read (and practice)
- 10 types of data scientists
- 66 job interview questions for data scientists
- Our Wiley Book on Data Science
- Data Science Top Articles
- Our Data Science Weekly Newsletter
- Practical illustration of Map-Reduce (Hadoop-style), on real data
- What makes up data science?
- DSC webinar series