Data Warehouse – too Big a Headache for Small and Medium Sized Businesses (SME)

The concept of Data Warehousing (DWH) was developed by Inmon [1] and Kimball [2] in the 90’s with the objective of storing a company’s own data centrally and in high quality. The DWH should be the only basis (Single Source of Truth) for tactical and strategic decisions of a company. These concepts have demonstrated that they are relevant today and the emergence of new technologies has led to various adaptations along the way. Until now, it has barely been studied how BIG DATA technologies can contribute to the customization of data models. What does a DWH for SMEs looks like today? What can BIG DATA contribute to it? This is what we want to discuss. The establishment and operation of a DWH, even with pre-defined data models and reports, can rapidly get too expensive for an SME. This issue is of particular concern since a DWH has multiple data layers with multiple data models. Does it mean that also the SMEs wanting to build up a database for decision making should also swallow the bitter pill of a complex data warehouse? The answer is no, and it is also very simple if you know how it is done. But first of all, let us look at the fundamental question: Why would SME’s ever need a DWH? The definition given by Zeh [3] provides a good and brief introduction regarding this: “A Data-Warehouse is a physical database that allows an integrated view of the underlying data sources.” [1] The purpose of a DWH can be simply formulated as:

  • Maintaining data histories for dimensional analysis, such as time series.
  • Merging of data from multiple data sources to obtain coherent findings.
  • Aggregation of data so that the information is easily visible and high-performance evaluations can be carried out.

Parallel to operational reporting, where data may be evaluated directly in data sources, the data is again stored physically in the DWH. Practice shows that data integration and data modeling account for about 80% of a typical Business Intelligence (BI) project. The expenses for the actual front end such as standard and ad hoc reports and dashboards take up only 20%. With the complex data modeling of a conventional DWH architecture, a DWH very quickly becomes cumbersome, is opaque for the business and is not consistent with the agile and constantly changing professional requirements — and more so, not with our approach at LeanBI. We therefore asked ourselves how we can significantly simplify a DWH for SMBs in a manner that it is optimally utilized at low cost and at the same time remains a long-term and high-quality solution for the customer. If we now consider the concepts of Inmon and Kimball, the Kimball approach is more suitable for our agile world. According to Kimball, subject-specific data marts are created with facts and dimensions, where the dimensions are centrally maintained via “Conformed Dimensions”. The dimensions include all master data such as time, product, region, customer and many more. The underlying DWH layer in the data marts brings the data in a consistent form and transform and aggregate the data marts. Therefore, according to Kimble, a Data Warehouse is primarily a sum of data marts. Inmon focuses on the consistent approach of introducing a DWH layer in the third normal form, which forms the basis of all data marts. Its advantage lies in the fact that the data is available in a very fine-grain form in the DWH and as a result, new, future requirements can be directly satisfied from the data pool layer of the DWH. In theory, both concepts make sense.

In practice, extremely complex DWHs built on both bases have emerged in many companies over the last twenty years. These are hard to replace, carry a long “legacy” and there were of course demands for simplification. Wide application is found in the following obvious concept: when the right hardware technology is available (MPP, Massive Parallel Processing), the data can be converted into a single, large relational data model. This way, one can also eliminate the data marts which always bring a high project and operating cost with them and restrict the breadth of information. Thus, there is instead a reduction of the layer model (although 2 to 3 layers still remain depending on the application). Through a combination of hardware and software technology, the query performance is not compromised and as a result the data is available everywhere, is not confined to silos and at the same time data modeling is relatively simple. This advantage has a price however. The data model requires strong technical and professional oversight to prevent modeling errors. The infrastructural requirements are very high and expensive, changes in most major relational models are complex, and implementations are also expensive. A strong dependence on computer science continues to persist. In recent years, In-Memory technology has become much better known, partly because of large software companies who have committed afresh to this technology, and partly because of the sharp drop in the price of physical memory (RAM). Not only the large software companies but also other companies are struggling with the integration of these In-Memory solutions. Thus, “Side by Side” solutions continue to emerge (for example, a HANA solution not under but besides a SAP BW). Instead of a reduction in TCO (Total Cost of Ownership), the system boundaries continue to expand. We cannot and do not want to throw all the experiences of the BI world into disarray although we have a fresh approach, with new growing technologies that are slowly making it possible to rethink.

What we will introduce here is a combination of architecture specifications and application of new DB and BI technologies. We simplify the data integration and data management massively on the raw Layer first. We then wish to offer a DWH which has simple, agile and business-oriented solutions. We will therefore scale down the share of data integration and modeling from the discussed 80% to 40% and some of the data modeling should be done directly by the business so that costs can again be significantly reduced. There are many TCO promises that remain unfulfilled. After studying our blogs, you will understand how significantly we can really reduce the TCO of a DWH solution to make it within reach of an SME. We then want to offer a high level of integration of various BI applications on the BI layer. This simple expectation does not get fulfilled by many BI vendors and consultants. For example, the planning part is usually accommodated by separate tools. Planning, dashboards, reporting and analysis all from an integrated tool, no matter on which device, from a single source without reinventing the wheel. Interested? Stay tuned … Next Blogs

  • A BI tool from a single source: We introduce our new technology partner
  • Data modeling for the mid-sized businesses
  • Planning and Budgeting, fast and easy for small and mid-sized businesses.
  • An ocean of data for Small and Mid-Sized Businesses. How do we use Big Data technology with small amounts of data?
  • Source connections made easy!
  • How can BI requirements be rapidly and easily compiled?

[1] Inmon, W.H. Building the Data Warehouse (Third Edition), New York: John Wiley & Sons, (2002).

[2] Kimball, R. and M. Ross. The Data Warehouse Toolkit: The Complete Guide to Dimensional Modeling (Second Edition), New York: John Wiley & Sons, 2000.

[3] Thomas Zeh: Data Warehousing als Organisationskonzept des Datenmanagements. Eine kritische Betrachtung der Data-Warehouse-Definition von Inmon. In: Informatik – Forschung und Entwicklung. 18, Nr. 1, 2003.