1. Introduction
A new evolutionary phase in the
corporations IT policy is spreading across North America. After years of
developing IT systems to respond to the urgent operational needs, many
companies ended up with a set of applications that interact more or less with
each other. But a close analysis reveals overlaps in applications functionality
or, even worse, important business processes that are not covered by any of the
existing applications.
As some of the old systems become
functional or technological obsolete, the companies are forced to replace them.
And most of them are trying to apply an enterprise view for the new systems, to
avoid ending up in the same state as they started with.
This correlates with the growing
trend of developing Business Intelligence applications to better support the
corporate strategic needs, over and above the operational needs of their
activity. This class of OLAP systems that includes Data Warehouses, Data Marts,
Decision Support Systems and so on require and enterprise view of the business
objectives, business processes, supporting data and enabling applications.
There are many models and
methodologies used by the industry experts to perform enterprise wide analysis,
architecture and deployment. In the Business Intelligence area, Open Data
Systems Inc. is applying John Zachman's Enterprise Architecture Framework
to the H.W. Inmon's Corporate Information Factory concept to define
enterprise wide solutions for our clients.
2. The Corporate
Information Factory
In the last 4 years, Open Data Systems Inc.
developed and refined it's own version of a recommended Corporate Information
Factory. Based on H.W. Inmon's concept, the practice on client's projects
allowed us to customize the generic model to reflect the particular business
environment.
Our concept is organized based on the enterprise
data flow, rather then on systems type as in Bill Inmon's concept. The complete
model shows all types of potential combinations a company can implement, but
based on specific business criteria parts of them can be easily eliminated
without affecting the other components architecture.
The basic architecture presented in Fig. 1 also
displays an enterprise wide metadata layer. We actively promote the global
metadata repository for the increased business data consistency that it
provides, as opposed to locally managed metadata layers. The practice proved
that the cost increase in implementing an enterprise wide metadata management
system is usually not more then 10% over the cost of local metadata systems,
but the benefits for the strategic planning of the enterprise are
incommensurable. Moreover, the increased cost is paying back in terms of
systems integration efforts, making the global metadata much more
cost-effective on long term then its alternative.

Fig.1 – Generic Corporate Information Factory
The main components of the generic CIF model are:
1. Source Systems, mainly represented by the OLTP
applications supporting the operational business activities of the enterprise,
but also by existing data archives or external data feeds.
2. The Operational Data Store, represents the bridge
between OLTP and OLAP environments. The data structure is similar to the
transactional systems, but basic cleansing and integration processes are
preformed on loaded data. For example master data objects are loaded from the
systems of reference (most credible source) and any additional data element is
appended to the structure defined by the system of reference.
To summarize, the Operational Data Store is:
- integrated:
data from disparate operational systems is consolidated into a consistent view
of the enterprise;
- subject oriented:
data is stored grouped by business subject areas, rather then optimal
transactional processing;
- volatile:
data is permanently added, updated and deleted, to provide a snapshot of the
current business environment;
- current valued:
there is no long term history in an ODS; it usually stores one day/week/month
worth of data;
- detail oriented:
data in an ODS is at the same level of granularity as the operational systems,
with no additional aggregates or summaries.
3. The Enterprise Data Warehouse, is the most important
concept in the CIF architecture, is specially designed to contain integrated
enterprise-wide detailed and summarized data, including a long enough history
to provide both strategic and executive management perspectives of the
enterprise.
The Enterprise Data Warehouse is characterized as:
- integrated:
data is stored in an enterprise consolidated view (universal naming
conventions, measurements, classifications and so on), even if the source
systems are not consistent;
- subject-oriented:
all relevant data regarding a business subject area is grouped together;
- non-volatile:
once the data was loaded it can be only read; the users are not allowed to
perform any update/delete/insert operations, so it can provide a consistent history;
- time-variant:
data is stored for long-term periods, quantified in years; it is not unusual
for detailed data to be stored for 5-10 years, and summary data for up to 25 years.
4. The Client Systems are represented by departmental
Data Marts, enterprise wide information systems (e.g. web portals) or local
reporting systems, as well as other OLAP components like Decision Support
Systems or MOLAP cubes. The client systems can be fed from the Operational Data
Store, from the Data Warehouse or from both of them simultaneously. The data
source for the client systems is defined by the integration or detail level
required for that particular application. For example, a Corporate Web Portal
will most likely receive data from the ODS for current information, but from
the Data Warehouse for long-term statistical displays.
5. The Metadata Management System is the logic and
semantic layer of understanding and interpreting the information stored by the
various systems. The complexity of information regarding the whole environment
is usually structured as:
- business metadata,
such as subject area definitions, business process descriptions, definition of
entities, attributes and relationships, technical implementations of business
information, enterprise wide aliases and their departmental equivalents of
business data elements, and so forth.
- technical metadata, describing the physical implementation of the business metadata.
It is, at its turn, organized in:
- static metadata, describing the objects with very rare
changes over time, such as tables descriptions and structure, attributes
description and physical definition, unique identifiers of data elements,
indexes defined for faster data access, entities relationships and the
corresponding foreign keys, and so on.
- dynamic metadata, known also as data metrics,
concerning data load volume and quality quantifiers, overall data statistics,
data flows, data usage patterns and other information about the usage of the
static structures.
3. The Zachman Framework
Enforcing the order in an
enterprise wide effort is a huge task in itself. Too many of the today's data
integration and consolidation problems are related to a lack of enterprise
perspective when the systems were built. For both companies building a new IT
structure, or re-building the existing one, the need for a systematic approach
became obvious.
The Open Data Systems Inc. is using for a number of
years the Zachman Framework to fit the pieces of puzzle into an organized
model. Introduced by John A. Zachman in 1987, its Enterprise framework provides
a template to organize the information and study the interactions between the
various components.

Fig.2 - Enterprise Architectural Concepts
The generic classification scheme shown in Fig.2 is based on analyzing the
contributing factors at each level of abstraction, for each of the major
activity layers. As this paper is not intended as an exhaustive presentation of
the framework, we will only briefly describe the rows and columns defined in
the matrix. Later in this presentation we will show specific usages of this tool
to implement the Corporate Information Factory.
The framework rows represent the abstraction levels used to perform the
system's analysis:
- Scope is the highest abstraction layer, usually
represented from fuzzy ideas or idealistic concepts.
- Enterprise Model represents the conceptual
level, where an initial modeling attempt is performed to define business
concepts that implement the Scope.
- System Model is the level where conceptual
objects receive a logical structure.
- Technology Model defines the physical objects
that will represent the logical structures.
- Detailed Representation layer is composed by the
fully specified physical implementations of each category.
The main enterprise activity layers are represented on the framework columns:
- Data layer reflects information representation.
- Function column is concerned about actions performed with the data.
- Hardware layer is an encompassing column for all
the computers, networking and other supporting equipment.
- People column shows the actors involved in the process.
- Time represents the scale associated with the
time elements on each abstraction level.
- Motivation is the engine of getting the things
done.
Each of the cells defined by the intersection of
the abstraction levels with the enterprise activity layers will have various
meaning and content based on the subject the framework is applied to. In the
following sections we will explore the applications of the framework for 2 of
the major Corporate Information Factory components: The Operational Data Store
and The Enterprise Data Warehouse.
4. Defining The ODS
The first recommended step in
building a Corporate Information Factory is the Operational Data Store. It is
usually less costly then the Data Warehouse, while providing a ‘good enough'
data consolidation across the enterprise. Even if its structure is still mostly
reflecting the transactional systems' design, it provides some enterprise wide
reporting capabilities missing from the individual applications.
Unfortunately, the most common
approach in defining an ODS is to use the largest system as architectural
reference and add the missing pieces of information from the other systems
available across the enterprise. Even if (or especially because) this method
allows the development team to start working very soon after project's
approval, this is the most costly and lengthy method. The initial design will
be changed several times during the project, almost every time when new
important data pieces come into play. The changes performed in the data
structures usually affect considerably the processes already designed and / or
built, extending the project duration even more.
The better approach is to perform
an enterprise level analysis from the beginning of the project. Even if some
executives might get upset by the time spent before writing the first line of
code, this approach defines the layout where each piece of the puzzle will fit.
Because each system will be analyzed from an enterprise view, it almost makes
no difference the order to incorporate the existing applications, as long as
master data is incorporated before transactional data.
Fig. 3 represents a sample Zachman
framework used by Open Data Systems consultants to define an Operational Data
Store from corporate perspective. While the rows and columns have the same
meaning as the basic Architectural Concepts template, the cells reflect the
actions to be performed and some samples of what to look for. The template can
be easily applied at corporate level and for each system to be includes in the
ODS analysis.

Fig.3 – ODS Definition Framework
5. Defining The Data Warehouse
The most important piece in a
Corporate Information Factory is the Enterprise Data Warehouse. It is the main
data repository and the most important source of trustworthy pre-packaged
information that can be used directly (with specialized tools) or off-loaded
into specialized information processing environments.
There are multiple methodologies
to design and implement a Data Warehouse. Open Data Systems adjusted the best
theoretical principles to respond to practical implementation issues into its
proprietary DW build methodology. In a nutshell, we promote a global high-level
analysis to formalize the ‘big picture', followed by an iterative design-build-implement
cycle. This allows our clients to achieve the most critical results in a short
timeframe (6-12 months) but still have the complete DW framework defined
upfront. This allows the next cycle to build on top of the previous one,
without having to redo much of the already built sections (as it is usually the
case with iterative DW build).
The most important step to enable
our over 90% reusability rate is the full-scale high-level analysis. During
this phase the complete Data Warehouse scope is analyzed, as well as all the
corporation structures and all the existing information systems. A conceptual
data model is usually created and the major systems of reference identified, to
ensure the consistency of the future build cycles.

Fig.4 - Data Warehouse Definition Framework
Fig. 4 presents a sample Zachman Framework used for the Data Warehouse
definition. While the rows and columns
have the same meaning as the basic Architectural Concepts template, the cells
reflect the actions to be performed and some samples of what to look for. The
DW template can be easily applied both for the initial overall analysis, as
well as for the detailed definition of each build cycle.
6. Summary
This paper was not intended as an
exhaustive presentation of either the Corporate Information Factory concept or
the Zachman Enterprise Architecture Framework. We only presented a method of
applying the order defined by Zachman's Framework to build the most complex
components of the Corporate Information Factory. The same method can be applied
for the structured definition of other systems that can be part of the CIF,
like Data Marts, Metadata Repository, MOLAP cubes and so on.
There are, of course, other
methods and methodologies to define a Corporate Information Factory. But the
Zachman's Framework implementation was successfully tested by Open Data
Systems' consultants and proved to shorten the time and project costs with more
then 10% over other methodologies.
For more information on the
Corporate Information Factory concept please refer to Bill Inmon's and Claudia
Imhoff's papers on the topic. Most of them can be found at www.billinmon.com. For details on
Zachman's Framework please visit www.zifa.com.