Information Production Process Data Governance in Action 14. September 2015, Data Governance
Personal background combines technical, human and healthcare perspectives Almost 15 years IS research and development University of Turku, Finland Information systems Empirical field studies in healthcare settings Turku University Hospital, Finland Healthcare datawarehousing Project management, system and service design. Aalto University, Helsinki, Finland Usability research Healthcare data and information quality research Siili Solutions, Helsinki, Finland Information management consulting Sami Laine Data Science Architect, Siili Solutions Oyj, Finland
Data Governance from the perspective of football-players Positions Roles&Responsibilities Tactics Processes Rules Policies, Standards 4-4-2 formation 4-3-3 formation 4-3-2-1 formation Football Bible
Data Governance from the perspective of football-players Positions Roles&Responsibilities Tactics Processes Rules Policies, Standards Counter-attacking football Long-ball/Direct football Wide-play and alternating wingers Talk Football
Data Governance from the perspective of football-players Positions Roles&Responsibilities Tactics Processes Rules Policies, Standards Field surface Dimensions Changing the goalkeeper Matches may be played on natural or artificial surfaces, according to the rules of the competition. The colour of artificial surfaces must be green. The length of the touch line must be greater than the length of the goal line. Length (touch line): Width (goal line): minimum 90 m (100 yds) maximum 120 m (130 yds) minimum 45 m (50 yds) maximum 90 m (100 yds) Any of the other players may change places with the goalkeeper, provided that: The referee is informed before the change is made The change is made during a stoppage in the match FIFA - The laws of the game
But then
the Game begins
Information Production Process is Data Governance in Action
Information Information Production Production Process Process consists (IPP) of consists three process of three phases phases based on Total Quality Management DATA SUPPLY DATA MANUFACTURING DATA CONSUMPTION Human Perspective Enters data for primary purpose Builds data sets for other uses Analyses and reports data Interprets data and makes decisions for secondary purposes DATA SOURCE A Processing Block INFORMATION PRODUCT A Technical Perspective DATA SOURCE B Processing Block STORAGE A INFORMATION PRODUCT B INFORMATION PRODUCT C DATA SOURCE C Processing Block Processing Block STORAGE B INFORMATION PRODUCT D In the 90s, researchers in MIT developed Total Data Quality Management framework.
Technical Perspective from the beginning to the end Technical beginning Data Sources store data sets Database Management Tools are used to manage data Data sets Data models Database security Etc. Technical result Information Products provide facts Business Intelligence tools are used to produce facts Statistics Visualizations User rights Etc. DATA SOURCE A INFORMATION PRODUCT A
In reality, Information Production Process is a complex system TDQM - DATA MANUFACTURING Data Integration Data Warehousing Business Intelligence DATA SOURCES Between data sources and reports there is a lot of different roles, skills and tools INFORMATION PRODUCTS
Information Production Process in Real Business Environment Lähdejärjestelm ät Potilashallinnon tietojärjestelmät Henkilöstöhallinnon tietojärjestelmät Taloushallinnon tietojärjestelmät Käyttäjien hallinta Palveluväylä Data Staging Area SA (SQL Server) - Työskentelyalu e KÄYTTÖOIKEUDET SQL Server EAI (Data integrator access servers) - Reaaliaika - Sanomat Tietovarasto DW (SQL Server) - Varastointi - Historiatieto Data Vault Tiedonsiirtojärjestelmä ETL (Data integrator job servers) - Massa-ajot - Poiminnat - Muunnokset - Lataukset Tietovarastoinnin hallinnointi- ja suunnittelujärjestelmät KÄYTTÄJÄTUNNUKSET Active Directory Tietovarannot Tieto Tieto OLAP (SQL Server SSAS) - Kuutioita - Yhteenvetoja Tieto Tieto DM (SQL Server) - Relaatiotauluja - Yhteenvetoja KÄYTTÖOIKEUDET SAP BO DS&BI Raporttijärjestelmä t (toiminnallisuus) Tuloskortti Mittaristo (Xcelsius) (Dashboard builder) (Web/Desktop Intelligence) Vakioraportit Ad hoc-raportit (WebIntelligence) (Crystal Reports) Ennusteet Analyysit (Voyager/Pioneer) (Explorer) Raportoinnin hallinnointi- ja suunnittelujärjestelmät Julkaisujärjestelmä (käyttöliittymät) Portaali (Infoview) (MOSS) Office (SAP BO Live Office) (Excel) (Word) (Powerpoint) Muut käyttöliittymät (Crystal reports) (Desktop Intelligence) (Voyager) (Explorer) (Mobile) Toimintamallit Palvelutuotannon johtaminen Sidosryhmäviestintä Valtakunnallinen johtaminen Tutkimus Hallinnointi Suunnittelu Hallinnointi (Central Management Console) (Data Integrator Designer) (SS Management Studio) Suunnittelu Hallinnointi (Universe Designer) (Central Management Console) Koodistot ja tunnisteet KOODISTOT PS. Information Production Process is a complex system!
Information Production Process is a complex system TDQM - DATA MANUFACTURING Data Integration Tool Data Warehousing Business Intelligence Tools for ICT professionals, e.g. data integrators
Information Production Process is a complex system TDQM - DATA MANUFACTURING Data Integration Database Management Tool Business Intelligence Tools for ICT professionals, for example, DBAs and data integrators
Information Production Process is a complex system TDQM - DATA MANUFACTURING Data Integration Tool Data Warehousing Dashboard Design Tool Tools for business analysts and software developers
Information Production Process is a complex system TDQM - DATA MANUFACTURING Data Integration Tool Data Warehousing Publishing Portal Tools for business professionals, end-users
The maturity these of software tools varies significantly Who can make modifications to information production processes? Regular business user, power user or ICT expert? What kind of modifications each user can create? Enter new data source, change calculation logics or modify visualizations?
Application Architecture of an Information Production Process During the cource you will see a lot of different derivatives of Information Production Process DATA SOURCES TDQM - DATA MANUFACTURING Käyttäjien hallinta And most likely, they will be described from the technical perspective. Data sets, Software Platforms, Software Tools, Graphical reports, etc. Lähdejärjestelm ät Potilashallinnon tietojärjestelmät Henkilöstöhallinnon tietojärjestelmät Taloushallinnon tietojärjestelmät Palveluväylä Data Staging Area SA (SQL Server) - Työskentelyalu e KÄYTTÖOIKEUDET SQL Server EAI (Data integrator access servers) - Reaaliaika - Sanomat Tiedonsiirtojärjestelmä ETL Tietovarasto DW (SQL Server) - Varastointi - Historiatieto Data Vault (Data integrator job servers) - Massa-ajot - Poiminnat - Muunnokset - Lataukset Tietovarastoinnin hallinnointi- ja suunnittelujärjestelmät KÄYTTÄJÄTUNNUKSET Active Directory Tietovarannot Tieto Tieto OLAP (SQL Server SSAS) - Kuutioita - Yhteenvetoja Tieto Tieto DM (SQL Server) - Relaatiotauluja - Yhteenvetoja KÄYTTÖOIKEUDET SAP BO DS&BI Raporttijärjestelmä t (toiminnallisuus) Tuloskortti Mittaristo (Xcelsius) (Dashboard builder) (Web/Desktop Intelligence) Vakioraportit Ad hoc-raportit (WebIntelligence) (Crystal Reports) Ennusteet Analyysit (Voyager/Pioneer) (Explorer) Raportoinnin hallinnointi- ja suunnittelujärjestelmät Julkaisujärjestelmä (käyttöliittymät) Portaali (Infoview) (MOSS) Office (SAP BO Live Office) (Excel) (Word) (Powerpoint) Muut käyttöliittymät (Crystal reports) (Desktop Intelligence) (Voyager) (Explorer) (Mobile) Toimintamallit Palvelutuotannon johtaminen Sidosryhmäviestintä Valtakunnallinen johtaminen Tutkimus Hallinnointi Suunnittelu Hallinnointi (Central Management Console) (Data Integrator Designer) (SS Management Studio) Suunnittelu Hallinnointi (Universe Designer) (Central Management Console) Koodistot ja tunnisteet KOODISTOT And most likely, they are focused on *data manufacturing* phase of the entire Information Production Process. INFORMATION PRODUCTS TDQM - DATA MANUFACTURING
The Aim of the Game
The Aim of the Game The aim of football game is to score GOALS. The purpose is not to run around, kick ball or follow the rules. The aim of Information Production Process is to enable valid DECISIONS. The purpose is not a huge report, flashy visualizations, or massive calculations.
THE TIP OF THE DAY Regardless of the actual purpose (DECISIONS), during the course, you will see a lot of side-effects: numbers, graphics and marketing promises!
The Real Aim of the Game
It all starts with a click Valid conclusions leading to informed DECISIONS click ahaa! and ends to an ahaa!
WHAT? click ahaa!
It all starts with a click and ends to an ahaa DATA SUPPLY DATA MANUFACTURING DATA CONSUMPTION Käyttäjien hallinta How data was actually CREATED! Lähdejärjestelm ät Potilashallinnon tietojärjestelmät Henkilöstöhallinnon tietojärjestelmät Taloushallinnon tietojärjestelmät Palveluväylä Data Staging Area SA (SQL Server) - Työskentelyalu e KÄYTTÖOIKEUDET SQL Server EAI (Data integrator access servers) - Reaaliaika - Sanomat Tiedonsiirtojärjestelmä ETL Tietovarasto DW (SQL Server) - Varastointi - Historiatieto Data Vault (Data integrator job servers) - Massa-ajot - Poiminnat - Muunnokset - Lataukset Tietovarastoinnin hallinnointi- ja suunnittelujärjestelmät KÄYTTÄJÄTUNNUKSET Active Directory Tietovarannot Tieto Tieto OLAP (SQL Server SSAS) - Kuutioita - Yhteenvetoja Tieto Tieto DM (SQL Server) - Relaatiotauluja - Yhteenvetoja KÄYTTÖOIKEUDET SAP BO DS&BI Raporttijärjestelmä t (toiminnallisuus) Tuloskortti Mittaristo (Xcelsius) (Dashboard builder) (Web/Desktop Intelligence) Vakioraportit Ad hoc-raportit (WebIntelligence) (Crystal Reports) Ennusteet Analyysit (Voyager/Pioneer) (Explorer) Raportoinnin hallinnointi- ja suunnittelujärjestelmät Julkaisujärjestelmä (käyttöliittymät) Portaali (Infoview) (MOSS) Office (SAP BO Live Office) (Excel) (Word) (Powerpoint) Muut käyttöliittymät (Crystal reports) (Desktop Intelligence) (Voyager) (Explorer) (Mobile) Toimintamallit Palvelutuotannon johtaminen Sidosryhmäviestintä Valtakunnallinen johtaminen Tutkimus What kind of DECISIONS will be made? Hallinnointi Suunnittelu Hallinnointi (Central Management Console) (Data Integrator Designer) (SS Management Studio) Suunnittelu Hallinnointi (Universe Designer) (Central Management Console) Koodistot ja tunnisteet KOODISTOT Click Source Systems Data Warehouses BI Reports Ahaa MISSING! DATA SOURCE A INFORMATION PRODUCT A MISSING!
It all starts with a click and ends to an ahaa THE TIP OF THE DAY Too often in Business Intelligence the focus is just in transforming source data to aggregated figures (e.g. graphs, statistics, etc.) Click Source Systems Data Warehouses BI Reports Ahaa MISSING! MISSING! DATA SOURCE A INFORMATION PRODUCT A
Why clicks and ahaa matter?
Average length of ambulatory hospital visits Hospital A Hospital B Which one of the hospitals is better?
Average length of ambulatory hospital visits Hospital A Hospital B Of course, basics of statistics matter! Populations should match Measurement should match Etc Of course, basics of business domain matter Patients and treatments should be similar Data should be standardized Etc What if these all look similar? Can you trust reports? Can you do decisions?
Average length of ambulatory hospital visits LENGTH = EndTime - StartTime EASY! But you still need to know and decide what exactly is being measured! Measuring the length of hospital visit Measuring the length of surgery
However, the problem is that data-driven process analytics can be affected by subtle contextual and human factors What does Ward Period starts really mean? Could it be patient registration time? How each timestamp was really created in reality? UNDERSTAND THE CLICKS! What kind of errors there might be in data sets?
Data Quality Research Question What does this really mean? 08:53 Laine, Sami, Lee, Carol, Nieminen, Marko (2015), Transparent Data Supply for Open Information Production Processes, In the Proceedings of the European Conference on Information Systems (ECIS), Münster, Germany.
How exactly is a registration timestamp value such as 08:53 created in hospital processes? Arrival Registration Treatments Discharge Departure
How exactly is a registration timestamp value such as 08:53 created in hospital processes? MEANING USER TASK TOOL ENVIRONMENT arrival at location Patient Self-registration Barcode card Current unit available service at reception Secretary (current user) Registration EPR & key press Current unit midnight at previous day Secretary (current user) Registration EPR & manual adjustment Current unit will leave at this time Secretary (at previous unit) Discharge EPR & manual adjustment Previous unit will be picked up at this time Secretary (at previous unit) Discharge EPR & manual adjustment Previous unit is leaving unit now Secretary (at previous unit) Discharge EPR & key press Previous unit
The same data value can mean completely different things, but they all look identical at data layer! You must understand the clicks! Registration at 08:53 arrival at location available service at reception midnight at previous day will leave at this time will be picked up at this time is leaving unit now Click 1 Click 2 Click 3 Click 4 Click 5 Click 6 Even a simple data element can be complex information!
It all starts with a click and ends to an ahaa THE TIP OF THE DAY If you rely on data and do not understand the original clicks, you end up doing bad decisions. Click Source Systems Data Warehouses BI Reports Ahaa arrival at location available service at reception midnight at previous day will leave at this time will be picked up at this time is leaving unit now Registration at 08:53 Average lead time is 4,5 hours?
Why clicks matter?
Inconsistent figures were produced about the same issue at the same time Ambulatory Procedures in Administrative Reports Ambulatory Procedures in Operation Room Reports Which one is correct?
Information Information Production Production Process Process consists (IPP) of consists three process of three phases phases based on Total Quality Management DATA SUPPLY DATA MANUFACTURING DATA CONSUMPTION Human Perspective Enters data for primary purpose Builds data sets for other uses Analyses and reports data Interprets data and makes decisions for secondary purposes Electronic Patient Record System Processing Block Statistical Datawarehouse Processing Block Administrative Reports Technical Perspective Radiology System Processing Block Operation Room System Processing Block OLAP Cube Operation Room Reports
Information Information Production Production Process Process consists (IPP) of consists three process of three phases phases based on Total Quality Management DATA SUPPLY DATA MANUFACTURING DATA CONSUMPTION Human Perspective Will not be updated if patient stays overnight! Does not copy codes manually from system to system! Does not mark all visits as ambulatory! Builds data sets for other uses Analyses and reports data Interprets data and makes decisions for secondary purposes Electronic Patient Record System Negative Processing Block error rate Billed Statistical from Data Warehouse patient as AP Negative Processing Block error rate Administrative Reports Ambulatory Procedures = 10726 Technical Perspective Radiology System No Not Integration! used Operation Room System Positive Processing Block error rate Planned as AP OLAP Cube Operation Room Reports Ambulatory Procedures = 15687
It all starts with a click and ends to an ahaa THE TIP OF THE DAY If you rely on data and do not understand the original clicks, you end up doing very bad decisions! You must understand Click the original clicks! Source Systems Data Warehouses BI Tools To make valid Ahaa conclusions! EPR encounter clicks EPR episode clicks ORS timestamp clicks Data about ambulatory procedures (yes, no) Report 10726 Report 15687 Billed from patient as AP Manually duplicated codes Planned as AP
Summary
Technical Data Manufacturing is not enough! Technical start Data Sources store data sets Not always correct data! Technical end Information Products provide facts Not always correct facts! Clicks! Original User Interfaces! Source Systems Data Warehouses BI Reports Ahaa! Managers Decisions! DATA SOURCE A INFORMATION PRODUCT A
It all starts with a click and ends to an ahaa THE LESSON OF THE DAY If you rely on data and do not understand the original clicks, you end up doing very bad decisions! Click You must understand the original clicks! Source Systems Data Data Warehouses Make sure that your organization has people who understand both Clicks and Ahaas and can follow the trace between them! BI Tools Report Ahaa To make valid conclusions!
QUESTIONS? Sami Laine Data Science Architect, Siili Solutions Oyj, Finland Aalto University, Department of Computer Science and Engineering, Finland sami.laine@siili.fi sami.k.laine@aalto.fi https://www.researchgate.net/profile/sami_laine/ https://www.linkedin.com/pub/sami-laine/2/a61/970
Appendix All definitions should start at the lowest, most atomic grain and should describe the physical process that collects the data. Thus, in our dimensional modeling classes, when we start with the familiar example of retail sales, I ask the students what is the grain? After listening to a number of careful replies listing various retail dimensions such as product, customer, store and time, I stop and ask the students to visualize the physical process. The salesperson or checkout clerk scans the retail item and the register goes BEEP. The grain of the fact table is BEEP! http://www.kimballgroup.com/2007/07/keep-to-the-grain-indimensional-modeling/