PT Notes
Are your PHA worksheets data lakes or data swamps?
PT Notes is a series of topical technical notes on process safety provided periodically by Primatech for your benefit. Please feel free to provide feedback.
In today’s world, companies are using data warehouses and data lakes to store and manage big data. A company may require both a data warehouse and a data lake as they serve different needs.
A data warehouse is a database optimized to analyze relational data and designed to allow fast queries, the results of which are typically used for operational reporting and analysis.
A data lake is a centralized repository of data stored in natural form. The structure of the data is not defined when data are captured. Therefore, data can be stored without careful design or the need to know what questions might need answers in the future. Different types of analytics can be run for data lakes to support dashboards, visualizations, big data processing, and machine learning to discover insights and guide better decisions. PHA studies can be stored in data lakes.
In contrast, a data swamp is a database that is poorly designed, badly organized, inadequately documented, and / or poorly maintained. Data swamps have inadequate or no curation, and little or no active management throughout the data life cycle. Data swamps are of little use and are frustrating and difficult to use. Users cannot analyze and exploit their data effectively.
Data lakes can deteriorate and become data swamps if not properly managed and curated. Whether your PHA worksheets are data lakes or data swamps depends largely on how you maintain them.
Companies that use data lakes enjoy a competitive advantage over their peers who do not. They are able to make better informed decisions on many aspects of business operations, specifically the risks faced.
You may be interested in:
Copyright © 2021, Primatech Inc. All rights reserved.