No matter what you do, you can’t avoid excel. So, may as well dive into it & tame the beast. Here are 5 excel Add Ins that every data scientist should install.
-
Microsoft Power Query for Excel, is a new add-in that provides a seamless experience for data discovery, data transformation and enrichment for Information Workers, BI professionals and other Excel users.
-
With Power Query you can:
-
Identify the data you care about from the sources you work with (e.g. relational databases, Excel, text and XML files, OData feeds, web pages, Hadoop HDFS, etc.).
-
Discover relevant data from inside(*) and outside your organization using the search capabilities within Excel.
-
Combine data from multiple, disparate data sources and shape it in order to prepare the data for further analysis in tools like Excel and Power Pivot, or visualization in tools like Power View and Power Map.
-
Share the queries that you created with others within your organization, so they can be easily found via Search. (*)
-
Top features of Power Pivot for Excel
-
Import millions of rows of data from multiple data sources With Power Pivot for Excel, you can import millions of rows of data from multiple data sources into a single Excel workbook, create relationships between heterogeneous data, create calculated columns and measures using formulas, build PivotTables and PivotCharts, and then further analyze the data so that you can make timely business decisions all without requiring IT assistance.
-
Enjoy fast calculations and analysis Process millions of rows in about the same time as thousands, and make the most of multi-core processors and gigabytes of memory for fastest processing of calculations. Overcomes existing limitations for massive data analysis on the desktop with efficient compression algorithms to load even the biggest data sets into memory.
-
Virtually Unlimited Support of Data Sources Provides the foundation to import and combine source data from any location for massive data analysis on the desktop, including relational databases, multidimensional sources, cloud services, data feeds, Excel files, text files, and data from the Web.
-
Security and Management Power Pivot Management Dashboard enables IT administrators to monitor and manage your shared applications to ensure security, high availability, and performance.
-
Data Analysis Expressions (DAX) DAX is a new formula language that extends the data manipulation capabilities of Excel to enable more sophisticated and complex grouping, calculation, and analysis. The syntax of DAX formulas is very similar to that of Excel formulas.
Solver is a what-if analysis tool for optimization. It is an add-in licensed from Frontline Systems that has shipped with Excel for many years. Solver helps to find an optimal value in one cell, called the target cell, on your worksheet. It has three solving methods for solving spreadsheet optimization problems.
-
Simplex Method The Simplex method is used for solving linear problems. The Simplex solving method has several performance enhancements in Excel 2010 resulting in greatly improved performance for some problem types.
-
GRG Nonlinear The GRG solver is used for solving smooth nonlinear problems. There is a new Multi-start search setting which when used in conjunction with the GRG solver results in better solutions, escaping locally optimal solutions in favor of globally optimal ones.
-
Evolutionary Solver The new Evolution solver accepts Solver models defined in exactly the same way as the Simplex and GRG Solvers, but uses genetic algorithms to find its solutions. While the Simplex and GRG solvers are used for linear and smooth nonlinear problems, the Evolutionary Solver can be used for any Excel formulas or functions, even when they are not linear or smooth nonlinear. Spreadsheet functions such as IF and VLOOKUP fall into this category.
There is a new type of integer constraint known as AllDifferent that form a permutation of integers, making it easy to define models with ordering or sequencing. The well-known travelling salesman problem is hard to define in the current solver, but it can be defined with just an objective and one AllDifferent constraint. Microsoft has added a number of new report types that provide additional detail about the problem being solved.
-
Linearity Report If you try to solve a model that is not linear, Solver will indicate that linearity conditions were not satisfied.
-
Feasibility Report If you try to solve a model that has no feasible solution, Solver will display the message “Solver could not find a feasible solution” in the Solver Results dialog.
-
Population Report Where the Simplex and GRG Solvers find a single solution, the Evolutionary Solver creates a population of candidate solutions. When this Solver Engine stops, the best solution from the population is displayed on the worksheet, but a new report is available in the Solver Results dialog.
If you need to develop complex statistical or engineering analyses, you can save steps and time by using the Analysis ToolPak. You provide the data and parameters for each analysis, and the tool uses the appropriate statistical or engineering macro functions to calculate and display the results in an output table. Some tools generate charts in addition to output tables.
The Analysis ToolPak includes the various tools some of them are described below.
-
Anova The Anova analysis tools provide different types of variance analysis. The tool that you should use depends on the number of factors and the number of samples that you have from the populations that you want to test.
-
Descriptive Statistics The Descriptive Statistics analysis tool generates a report of univariate statistics for data in the input range, providing information about the central tendency and variability of your data.
-
Fourier Analysis The Fourier Analysis tool solves problems in linear systems and analyzes periodic data by using the Fast Fourier Transform (FFT) method to transform data. This tool also supports inverse transformations, in which the inverse of transformed data returns the original data.
The Fuzzy Lookup Add-In for Excel was developed by Microsoft Research and performs fuzzy matching of textual data in Microsoft Excel. It can be used to identify fuzzy duplicate rows within a single table or to fuzzy join similar rows between two different tables. The matching is robust to a wide variety of errors including spelling mistakes, abbreviations, synonyms and added/missing data. For instance, it might detect that the rows “Mr. Andrew Hill”, “Hill, Andrew R.” and “Andy Hill” all refer to the same underlying entity, returning a similarity score along with each match. While the default configuration works well for a wide variety of textual data, such as product names or customer addresses, the matching may also be customized for specific domains or languages.
This list was was made researched by Megter