Data Modeling
Identifying and Clarifying The Details of Your Data

Data modeling is one of the most difficult and simultaneously most important activities in the systems development process because in the data model we define the language in which everything else in the problem domain is described; data modeling captures the relationships between data elements and data structures. If you get the data model "wrong", your application might not do what users need, it might be unreliable, and it might fill up the database with "garbage".
It is through the data model that all of the “things” discussed within the problem domain come to have meaning – acquire and then enforce their semantics. Therefore, data models have a much wider relevance that just in the development of databases; data models are required to define the “things referenced” across all of the other models: organizational, process, state/event, location, etc.
There are many forms of data models: hierarchical, network, relational, functional, object-relational, etc., and they may be defined at different levels of abstraction: conceptual, logical, physical, etc. Yet all of these models have three aspects: a structure, constraints, and operations. The formal mix of these three aspects determines which of the “forms of data model” the particular model in question is.
While on the subject of abstraction, data models have two dimensions of abstraction: generalization and aggregation. Of the two, generalization being the “abstraction of ‘thing’” gets the most attention. Nevertheless, aggregation – the abstraction of constrained relationship – should be more thoroughly addressed and is so within the Aviamedia approach to data modeling.
At Aviamedia, we use most of the forms of data models mentioned above, where appropriate to the problem domain and where relevant to the type of our engagement (domain analysis, application requirements, system design, etc.)
We develop data models of the business domain using the concepts of Entity-Relationship Model but capturing the representation in UML Class Diagrams and if the constraint semantics are too complex, the Object-Role Model – a semantic net model. In design, we use UML Class Diagrams as well as Logical and Physical Database Models and hierarchical data models when designing hierarchical interfaces or repositories such as XML.
Many practitioners only discuss data models in the context of data persistence – how data will be stored and accessed. At Aviamedia, we understand data models in a larger context, as mentioned above; data models define the semantics of our data. How our data becomes information. How data is acted upon by processes and transformed into other data. How data is constrained and the role other data plays in understanding those constraints. All of this information – the definitions of the “WHATs” within our domain are captured in Data Models. Data Models, in all their variety captures this meaning – as it is persisted, as it exists transiently within an actively executing process, and as it exists and defines the interface structures between systems and processes, and on and on.
When data modeling, you are identifying the following characteristics:
- What are the names of the data elements?
- How large can each element be?
- What type or kind of information can each element contain?
- With what entity/“thing” in the domain is an element attributed?
- What elements in an entity may be left blank?
- Which elements are constrained to a fixed range of values?
- Whether and how the various entities are associated?
- Whether and how the various entities are classified?
- How entities are to be conceptualized – as tables, as nodes, etc.
- How relationships are conceptualized – as foreign keys, as facts, as functions, etc.
Benefits of Data Modeling
It should not come as a surprise that different communities recognize different value from data models; Operational Database Designers value different aspects of data models than do Data Warehouse Designers.
Operational Databases
The operational databases of an enterprise capture the business transactions – moment by moment, business event by business event, as the processes of the business are conducted. This transactional perspective on data is heavily emphasized in operational databases. As in all database operations, the critical tradeoff is between update performance and retrieval performance. Therefore, operational databases tend to be optimized around their update performance and ability to respond to and capture business transactions as they occur.
Early in the development lifecycle, the data models from which operational databases are developed tend to be highly specific in there needs to elucidate the business rules, business constraints, and other true requirements of the enterprise. As development moves into the design phase, however, good data base design focuses on increasing the level of abstraction of the database in order to simplify and generalize the effort of interacting with that design. A related factor is that a well abstracted database design implementation will reduce future maintenance costs.
Of course, we must not so abstract the design to the point that the meaning and semantics of the persisted data is lost or obscured; this semantic clarity is often referred to as “data transparency”. Database design can be seen as a balancing act in which the experience of our analysts working with the business expertise and knowledge of your team works together to develop an optimum solution. What is optimum depends upon the true needs and requirements of your business; there is no “right” answer, but only an array of solutions among which the team must decide and pick the most “effective.”
Ironically, the correctness of the team’s choice of abstraction, transparency, and effectiveness can only be known after the fact when circumstances have forced some change in your business requirements. It is only within the activity of database maintenance – its ease or difficulty – that the wisdom of your design decisions and trade-offs can be truly known.
Data Warehouses
Designing data warehouses correctly by using a data model helps meet many of today’s data integrity challenges. Key benefits include:
Optimizing Database Performance for Business-Centric Reporting
Query performance is crucial in data warehousing. Data volumes and transactional performance is sacrificed to ensure the highest level of performance for our queries, but the queries only perform optimally if the database design is correct.
Large data volumes are involved, so the use a trial-and-error approach to data warehouse design is inadvisable. Therefore, it is beneficial to use data modeling products to automate this process and make it easy to manage all of the metadata associated with data warehouses and the data that is consumed by business intelligence (BI) systems. After the dimension entities and fact table design, we can define the relationships between tables. The wider BI team then reviews, assesses, and incorporates any changes in the design. After the logical design is approved, the work on the physical design can begin.
Ensuring that business requirements and reports are accurate and meaningful
The logical model captures business requirements. It uses naming conventions that closely match the business terms that an organization uses and is the design that is presented to the outside world. Developers of other systems use this design to create interfaces into the data warehouse. We can create several logical models to match the needs of data consumers, while underneath the physical model remains the same.
Documenting source and target systems correctly
When modeling an Extract, Transform, and Load system, it is essential to verify the logical and physical models of the source system as well as the models of the destination. It is often necessary to create an intermediate model for a staging area because many operations cannot be performed in one step. Also, often, extract, transform, and load operations cannot happen at the same time due to the requirements of the source and destination systems.