Robin LinacreinTowards Data ScienceWhy Probabilistic Linkage is More Accurate than Fuzzy Matching or Term Frequency based approachesHow effectively do different approaches to record linkage use information in the records to make predictions?4 min read·Oct 26, 2023----
Robin LinacreWhy parquet files are my preferred API for bulk open dataThey provide one a cheap, easy to use and performant API for accessing bulk data, and SQL can be used in-browser as a universal API7 min read·Jan 10, 2023----
Robin LinacreinTowards Data ScienceThe Intuition Behind the Use of Expectation Maximisation to Train Record Linkage ModelsHow unsupervised learning is used to estimate model parameters in Splink5 min read·Oct 14, 2022--2--2
Robin LinacreSplink 3: Fast, accurate and scalable linkage and deduplication in Python with support for…Splink 3 now offers support for Python and AWS Athena backends, in addition to Spark. It’s now easier to use, faster and more flexible…3 min read·Aug 6, 2022--1--1
Robin LinacreThe Downfall of Command and Control Data LeadershipSeemingly every big organisation has a new data platform that’s always just around the corner. Why they fail to live up to expectations?6 min read·Nov 8, 2020----
Robin LinacreinTowards Data ScienceDemystifying Apache ArrowIn my work as a data scientist, I’ve come across Apache Arrow in a range of seemingly-unrelated circumstances. However, I’ve always…7 min read·Oct 22, 2020----
Robin LinacreinTowards Data ScienceFuzzy Matching and Deduplicating Hundreds of Millions of Records using Apache SparkIntroducing splink, a Pyspark library for record linkage at scale using unsupervised learning4 min read·Apr 16, 2020--2--2
Robin LinacreEffective testing of analytical models using automated sense checksI once worked for a slightly terrifying senior analyst.5 min read·Aug 26, 2019----
Robin LinacreQuestions Senior Leaders Should Ask Their Data Delivery TeamsHow to improve the likelihood of success whilst reducing the governance burden on teams8 min read·Mar 14, 2019--1--1
Robin LinacreFirst impressions of the ONS’s new beta data servicesI was inspired by a recent tweet to experiment with some of the ONS’s new beta data functionality. There’s a lot to be excited about…6 min read·Feb 10, 2019----