Data Objects and System of Record

Post Reply
Kevin Campbell
Posts: 40
Joined: 13 Sep 2010, 20:26

We're in the process of populating the new classes provided by the Information Pack (for which many thanks).

Because of the 1:1 relationship between a data object and an application provider as the System of Record, I'm assuming that we will actually have a number of data objects associated with each data subject until such time as we collapse our application landscape. Within the Subject of "Customer", each of the objects "Customer Ship To", "Customer Bill To" would actually occur as many times as we have application provider instances that are considered systems of record within their domain - correct? There'd conceivably be "Customer Ship To - EMEA ERP", "Customer Ship To - ROW ERP" and so forth?

Kevin
User avatar
jonathan.carter
Posts: 1087
Joined: 04 Feb 2009, 15:44

Hi Kevin,

Just to be clear on the semantics of the System of Record, this identifies the particular application that is the master system where instances of a Data Object are created or deleted. We have made this single cardinality on the basis that good Data Management practise requires that only a single System of Record should exist for a Data Object. Of course, a particular application could be the System of Record for multiple Data Objects.

What we can show in the Views (e.g. Data Provider view) by querying the model is where Data is being provided and ensure that the data that is being provided by a System of Record. If not, then we have potential Data Management issues - that is we don't have a single source of the truth for that particular Data Object.

Data Subjects are conceptual level things that allow us to group data objects into 'types' that gives us a semantic grounding for an Object. i.e. We can say this Data Object (e.g. Customer Master) contains data about a particular Data Subject (e.g. Customer).

At the logical level (design), we define logical Data Objects that describe how data is (or needs to be) used in the enterprise, including the attributes. In this way we can describe what the Customer Master object looks like in terms of attributes and also describe what the Customer Hierarchy object looks like - in this case, we'd expect rather different objects here.
The Data Objects are still at what we call the 'pure design' level and it's at the Data Representation meta class that we describe how these objects are represented in our systems ('implementation design').

Using your example, we would define a Data Subject, Customer, and then define Customer Ship To and Customer Bill To. Then to capture the representation of these in the EMEA ERP Application Provider, we would create Data Representations 'Customer Ship To in EMEA ERP' and 'Customer Bill To in EMEA ERP'.

Note, by the way, that we can capture the organisational scope of data - that is who's Customer data, for example - at the physical data level in Physical Data Objects. This has to be at the physical level because we are describe the data values themselves when we are talking about which Customer Ship To in EMEA ERP instances in an Information Store. We don't capture the actual data values, though, of course!

However, as I think you've spotted, if what we want to say is (best-practice Data Management aside - or even that for regional reasons) that we have 2 ERP systems that are the System of Record for Customer Ship To we can't with the way the meta model is just now.

Thanks for pointing this out - we'll put together a very small update script to make the SoR slot multiple cardinality.

I'll post back here when it's ready

Jonathan
Essential Project Team
Kevin Campbell
Posts: 40
Joined: 13 Sep 2010, 20:26

Say that we have two implementations of an ERP, but their architectures differ - one is deployed on DB2 and AIX, the other on Oracle and Sun. We've created two Application Providers because the deployments and technology products are different in their two architectures. They are each the SoR for a given Data Object, but only within the context of specific information stores.

This is not necessarily a violation of MDM principles, as each could be supporting a distinct legal entity. I think the answer may be to create multiple Data Representations, and perhaps this would the level at which specifying a single SoR would be appropriate?

Kevin
User avatar
jonathan.carter
Posts: 1087
Joined: 04 Feb 2009, 15:44

Interesting scenario.

We discussed your last post in the team and came to the conclusion that it is necessary to allow multiple systems to be defined as the SoR for a Data Object. We need to be able to capture the fact that currently we have multiple SoRs for Product Item data, for example, as we might be in the process of improving our Data Management - or as in your scenario, we have separate systems which do not share data (e.g. the EMEA business is totally separate from the ROW, and Product Items are managed separately).

So, I think we at least need the ability to have multiple SoRs defined against a Data Object - and we've seen this with global and multi-national organisations.
Why at this level of abstraction?
Data Representations are capturing how the Data Objects are actually put together in the systems that we use - so defining the SoR at this level would add little value and doesn't really reflect what we're trying to manage here. Rather, we are saying "ERP_1 is the system where Product Item data should be mastered", i.e. ERP_1 is the SoR. This means that even when we distribute Product Item data to another system which would have a different Data Representation (even if the structure is the same, the technology used to represent it etc. can be different), we are still talking about Product Item data.

The idea of defining the SoR is separate from the actual facts of where data is actually created / mastered and distributed. The point of the Data Provider View is to highlight where we have data being mastered by systems that have not been identified as the System of Record. There might be good reasons why we can't always master certain data items in the SoR but the important thing is to understand what's going on.

OK, so to your specific question:
You're right, those two ERPs (with the different architectures would be captured as two different Application Providers).
In terms of whether they are SoR etc. is down to the specifics of your enterprise - and reflects your Data Management policies / standards. As I mentioned above, if your DM policy states that both of these are SoR for Product Item (let's say), then we need to capture that. However, I think there are some subtleties to consider, here with respect to the Information Stores.

In my last post, I introduced the concept that it is only at the Information Store level that we can understand the organisational scope of the Data in that store. That is, who's data is in this Information Store - and of course, there could be multiple organisations' data (e.g. global ERP system's database) or it could be that different divisions of the organisation have their own data stored separately. These days, with large shared systems, we're often looking at pretty coarse grain organisation units for this sort of separation rather than at the local team level but the meta model allows for all scenarios.
In addition to the organisation scope, it's only at the physical data object level (in Information Stores) that we can assign quality attributes (accuracy, timeliness, etc.) because we are describing the actual data values in that Information Store.

Your post describes a scenario where the SoR decision is defined by the context of Information Stores. So, to paraphrase you, EMEA ERP is SoR for Product Item in the EMEA Information Store and ROW ERP is SoR for Product Item in ROW Information Store - which I think is trying to say that the SoR for EMEA Product Item is EMEA ERP and for ROW Product Item is ROW ERP. However, I think the second statement is not quite the same as the first. The statement about which system is the SoR isn't really about which Information Store (e.g. an specific database) that data is stored in but rather about which system is used to master it.

From our Data Management perspective considering our SoR statement, we can (when we've patched the meta model!) say that the System of Record for Product Item is EMEA ERP and also ROW ERP. We might use the fact that these systems operate separately, supporting separate business units and therefore manage the data in separate Information Stores to make this decision but the intention of this attribute is to capture the statement about the SoR (which comes from our DM strategy/policy) not to derive the SoR from the contents of the model, which I would argue we can't do.
What we can do is compute the diversity / gap between the statement about the SoR (they way we would like things to be) and the reality of where data is being mastered.

You're correct in your last paragraph - when we are looking at the data that the Application Providers are using, we are talking about Data Representations and we normally use a naming convention for Data Representations that is derived from the Data Object that is being represented and the Application Provider that it's represented in (in fact, off the top of my head, I think this is automated). So as I mentioned above, in many scenarios, a single SoR would be defined and what we do in the Data Provider View is analyse this against what is going on at the Data Representation level and this means that we can even understand where multiple representations of a Data Object managed by a single Application Provider are being mastered / distributed.

I think the key point is that making the decision about the SoR is not something that we can 'calculate' or derive. We might use something like the Data Provider View to identify the best SoR for our specific needs but there are many aspects that will affect that decision. I think in terms of a scenario where we need to manage the data of two distinct legal entities separately, then we would actually want to identify two separate SoR to reflect that. Unless one of those systems was to master and distribute the mastered data to the other legal entity (and perhaps violating some legal requirements) we would need to be able to master that data in two separate systems, which means two SoRs.

Apologies if this has turned into a bit of an essay but hopefully it provides some insights and background into what we are trying to support and how it works.
We pretty quickly came to the conclusion on reading your first post that there could be perfectly valid scenarios that require multiple SoRs. Fortunately, there seems to be plenty of space on the Form for a quick tweak of the meta model to make that a multiple instance slot!

Thanks again!

Jonathan
Essential Project Team
Post Reply