Analytics: a brief history
Data analytics is the process of analyzing raw data to help organizations gain a deeper understanding of information. The practice helps uncover patterns, trends, and relationships to inform business decisions.
While data and information gathering is the lifeblood of modern organizations, this data is only useful if it is expertly analyzed. Discovering patterns in data and drilling down into accessible visual information enables organizations to make strategic business decisions for sustainable growth. It’s also a great way to capitalize on new opportunities and negate future risks by driving businesses to be adaptable to emerging trends and the industry around them.
People have used data for calculations and record-keeping for thousands of years, but the evolution of modern data analytics stretches back to only the 19th century. In this article, we’ll look at the history of data analytics, how it has altered how we work and live, and what we might see in the future.
The invention of the Tabulating Machine
German-American inventor Dr. Herman Hollerith created the first groundbreaking device in data analytics and processing — the pioneering punch card tabulating machine. This electro-mechanical machine helped to process data formatted on punch cards systematically. The federal government first used it in 1888 to help compile statistics for the 1890 US census.
A complete tabulation of the data from the 1880 census had taken seven years, but Hollerith’s machine meant all this demographic, economic, and population data could be processed in an efficient, economical, and timely manner.
With this device, the 1890 census was completed in just 18 months. Later models would assist with business applications such as inventory control. However, the 1890 census was the first time the government could act on “current” statistics for effective decision-making — for example, the planning and allocating of funds for local services.
Introducing relational databases
Soon, relational databases came to the fore – the work of IBM researcher, Edgar Codd. Relational databases work by organizing data into one or more tables or "relations", based on common fields or attributes. Each table comprises rows and columns, where each row represents a record, and each column represents a field or attribute of that record.
Relational databases use Structured Query Language (SQL) to manage and manipulate data. SQL is a standard programming language used to interact with databases and allows users to create, read, update, and delete data in tables.
To retrieve data from a relational database, users can use SQL queries to filter and sort data based on specific criteria. For example, a query could be used to retrieve all customer records where the customer's last name starts with "Smith."
Since the 1970s, relational databases have emerged as the most widely used data management system today. While they enable businesses to analyze data on demand and easily maintain accurate records, they come with limitations.
Their rigid nature means they can only efficiently process small amounts of structured data (e.g., names and zip codes) and find it challenging to deconstruct unstructured data.
Data warehouse technologies
While relational databases, specifically SQL, made it possible for organizations to quickly access and amend large pools of records without creating complex commands, data warehouses are optimized for a quick response time to queries.
Data warehouses were introduced in the late 1980s following a surge in collected data, partly due to the lower costs of hard disk drives. In a data warehouse, data is often stored using a timestamp, allowing businesses to run queries based on these timestamps – for example, if they want to compare sales trends for each month.
Operation commands, such as DELETE or UPDATE, are used less frequently. Still, this concept introduced by William H. Inmon was developed to help transform data from operational systems into decision-making support systems. Data warehouses were traditionally hosted on-premises (usually on a mainframe computer) but are now commonly part of the Cloud or hosted on a dedicated appliance.
The history of data mining
Data mining is a data analysis method introduced in 1990 that involves discovering anomalies, correlations, and patterns within large data sets to predict outcomes. This computational process came about directly from the evolution of database and data warehouse technologies.
Data mining allows for a higher data storage capacity while scrutinizing information quickly and efficiently. Consequently, businesses started predicting the needs of their customers, based on a closer look at their historical purchasing habits.
A pitfall of data mining is that data can be easily misinterpreted. For example, if an individual has purchased two white shirts online, they likely won’t need any more in the short term. Targeting that person with white shirt advertisements is both a waste of time and annoyance. With this in mind, data mining is a powerful tool, and data scientists should use the process to build models that account for potential misinterpretation.
In steps Google
This search engine powerhouse began as a research project in 1996 and soon developed into a fast, efficient search engine that processed and analyzed big data in distributed computers.
Google gave internet users an automated, scalable, and high-performance discovery system. In just a few seconds, it would analyze a website’s relevance and display the pages it deemed most beneficial to the search query.
The need for non-relational databases
The internet's explosion in popularity during the mid-1990s, with its rich flow of information and variety of data types, saw relational databases get left behind. This need for faster processing and the processing of unstructured data gave way to the introduction of Non-relational databases (also called NoSQL). NoSQL systems can handle many types of data, including storing and processing large amounts of unstructured data.
NoSQL’s flexibility (compared to SQL’s rigidity) means it can quickly translate data using different languages and formats. Scalability also becomes much simpler with a non-relational database, with many businesses using this data analytics method to leverage big data for analysis and reporting.
Big steps for big data
The term “big data” was first coined in 2005 by Roger Magoulas. It was used to describe data sets that seemed too large or complex to handle by traditional data-processing application software.
The open-source big data framework Hadoop was developed in the same year. It could process a collection of structured and unstructured data streaming in from all digital sources that were seismic in volume and rapidly generated.
Big data is used in many industries, including finance, real estate, and travel, to help improve B2B operations and facilitate better decision-making.
Data analytics in the cloud
Our journey through the history of analytics next takes us to cloud-based data analytic platforms providing centralized data access for every user so they can contribute to business decisions. Using cloud-based services and tools to analyze data was groundbreaking, enabling businesses with limited resources to act quickly on demand and reduce their infrastructure costs.
This is because, with cloud computing, you only pay for what you use. Other key benefits of cloud data analytics include the following:
- Its efficiency to allocate resources where they are needed most.
- Its speed of delivery.
- Increased communication and information access for seamless collaboration across workforces.
In the early 2010s, Google BigQuery, Snowflake, and Amazon Redshift (a cloud-based data warehouse) were released. More cost-effective than on-premises solutions, data analytics in the Cloud lets you pay for only the resources you use. As in-house expertise isn’t needed to maintain servers and software, it also brings remarkable IT cost savings.
Cloud analytics allow companies of all sizes to access big data technologies, so speedy, agile, and data-driven business intelligence decisions can be made.
The importance of GA4
Google Analytics (GA4) is a free analytics service that allows those working within analytics to monitor traffic and engagement across their website(s) and app(s). Compared to its predecessor, Universal Analytics, its visual pathways have improved customer journey tracking and path analysis, with users and their interactions captured solely as an event.
This allows data analysts to better understand users' website flow and formulate a strategy to optimize the steps leading them to the final conversion point.
While GA4 is easy to implement through Google Tag Manager (still the recommended practice), it has its limitations. Universal Analytics will stop processing data in July 2023, making data migration complex. Without data or tag migration, you won’t be able to import your historical data to the new platform. This challenge only heightens with the organization’s size — you can have hundreds of tags to transfer.
Another drawback of GA4 is its inability to report on users individually. Reports only show the overall visitor data, so you don’t get insight into how certain visitors engage and convert. For example, if two visitors land on your site at the same time and engage in entirely different ways, you will only be given an average of the two sets of data.
If you think this analytics service isn’t suitable for your business, explore different Google Analytics alternatives. Matomo, with its customizable reports and privacy-compliant tracking, is an excellent option. It also allows you to get detailed insights into website traffic.
The expected evolution of data analytics
New legislation, policies, and technologies mean data processing, analysis, storage, and retrieval constantly evolves. As data becomes complex and detailed, expect a greater emphasis on data quality.
Analysts will likely need to utilize machine learning (ML) and artificial intelligence (AI) to detect errors and improve reporting so companies confidently make decisions based on high-quality data.
As an exercise, we also asked ChatGPT about its predictions for data analytics and future AI use. It stressed that “the future of data and AI is extremely promising, with continued technological advancements and an increasing demand for data-driven decision-making across industries.”
It continued by highlighting some “potential developments in this field.” These include:
- Increased automation
- More personalized insights
- Greater efficiency and accuracy
- Integration with other technologies.
Real-time data visualization will undoubtedly be at the heart of data analytics in the future. Accessing, analyzing, and exploring higher qualities of quality data at speed will require more intuitive tools.
Like its fascinating history, we will benefit from more exciting and game-changing discoveries to give even greater data analytics insight.
Build a data pipeline in less than 5 minutes
Create an accountSee RudderStack in action
Get a personalized demoCollaborate with our community of data engineers
Join Slack Community