This article was written by Melissa Bierly.
If you use Excel at work, you’re probably frustratingly familiar with this dialog box:
You’ve probably also worked on a spreadsheet that was painfully slow to edit, where every click was accompanied by 10 seconds of your screen freezing and the spinning wheel of death.
Usually large files or workbooks full of formulas are to blame. Excel can handle up to a million rows, but when you’re working with a large dataset or doing heavy duty analysis—applying formulas to a bunch of cells, linking multiple spreadsheets, or connecting to other workbooks—it slows down way before you hit the row limit.
Excel has another weakness that can lead to inefficiency: its structure is too flexible. That might sound kind of crazy—flexibility is one of the reasons people love Excel. Since each cell is its own entity, you have a lot of freedom to add footnotes, merge cells together, or plot out a needlepoint pattern.
However, if one cell can be manipulated easily, it’s harder to trust the integrity of the spreadsheet as a whole. Excel’s flexibility makes enforcing consistency and accuracy in large datasets nearly impossible. No matter how vigilant you are, no matter how many times you comb through a spreadsheet for typos and broken formulas, you might still miss something.
There’s often a better tool for the job.
This long article has the following sections:
- Oh, hey, SQL
- Spreadsheet, meet relational database
- The leap from formulas to queries
- Free workbook: The Excel User’s Quick Start Guide to SQL
- Next steps for learning SQL
- Pick tutorials geared toward data analysis
To read the full original article with code samples and illustrations, click here. For more SQL related articles on DSC click here.
DSC Resources
- Services: Hire a Data Scientist | Search DSC | Classifieds | Find a Job
- Contributors: Post a Blog | Ask a Question
- Follow us: @DataScienceCtrl | @AnalyticBridge
Popular Articles