A quick demonstration of polling confidence interval calculations using simulation
At dinner with friends last Sunday, the topic of conversation fixated on — what else — the upcoming presidential election. That morning, a poll had… Read More »A quick demonstration of polling confidence interval calculations using simulation
Johns Hopkins Covid-19 Data and R, Part II, data.table functions and graphics, plus R-Naught.
Summary: This blog is part II of a series showcasing management and analytics of the daily U.S. Covid-19 case/death data published by the Center for Systems… Read More »Johns Hopkins Covid-19 Data and R, Part II, data.table functions and graphics, plus R-Naught.
Johns Hopkins Covid-19 Data and R, Part I — data.table handling.
Summary: This blog showcases the handling of daily data of cases/deaths from Covid-19 in the U.S. published by the Center for Systems Science and Engineering at Johns… Read More »Johns Hopkins Covid-19 Data and R, Part I — data.table handling.
Multi-Dimensional Frequencies with R data.table.
A few years ago, in a Q&A session following a presentation I gave on data analysis (DA) to a group of college recruits for my… Read More »Multi-Dimensional Frequencies with R data.table.
Dataframe Storage Efficiency in Python-Pandas
Summary: It’s no secret that Python-Pandas is central to data management for analytics and data science today. Indeed, what we’re seeing now is Pandas being… Read More »Dataframe Storage Efficiency in Python-Pandas
Multi Gigabyte R data.table for Ohio Voter Registration/History
Summary: This blog details R data.table programming to handle multi-gigabyte data. It shows how the data can be efficiently loaded, “normalized”, and counted. Readers can… Read More »Multi Gigabyte R data.table for Ohio Voter Registration/History
Using "record id's" to facilitate processing in Python-Pandas and R-data.table.
Both R and Python-Pandas are array-oriented platforms that support fast filtering through vectors of record-id’s. In Python-Pandas, such vectors are implemented via Pandas’s powerful index… Read More »Using "record id's" to facilitate processing in Python-Pandas and R-data.table.
Kicking Chicago with R.
Like most Chicago football fans, I was pretty distraught after the Bears lost last Sunday’s playoff game courtesy of a missed field goal at the… Read More »Kicking Chicago with R.
AWK — a Blast from Wrangling Past.
I recently came across an interesting account by a practical data scientist on how to munge 25 TB of data. What caught my eye at first… Read More »AWK — a Blast from Wrangling Past.