jump to navigation

A Guide to Successful Big Data Implementation July 24, 2016

Posted by Mich Talebzadeh in Big Data.
trackback

As a consultant I spend most of my time on a number of Big Data Projects (BDP). Big Data refers to the vast amount of Data being generated and stored, and the advanced analytics processes which are being developed to help make sense of this Big Data. Some of these projects are green fields and some are in advanced stage. Not surprisingly a number of these projects are kicked off without a definitive business use case and follow the trend in the Market, in layman’s term what I call Vogue of the Month.

As a consequence, a number of these projects fail and regrettably the proportion of this failure is greater than the other IT projects. In my opinion, by far insufficient scoping of the project has been responsible for the majority of BDP failures. In some cases, the goal post either changed too often or project objectives were not well defined in advance or too much was expected from the project.

There is another dimension to add to this. The technology landscape of Big Data is very fluid and in some cases relying on immature technology, much of which is open source with little use case (meaning used in production) has resulted in costly failures.

Like any other I.T. project if Big Data Project cannot deliver business value and was created as a purely technology concern and focus, then one should not be surprised for its lack of success.

Whilst there is a general agreement on the value of Big Data, a big data project should work under the same constraints that other IT projects work with well-defined objectives and roadmap.

A BDP by definition is riskier that the other established IT projects because of:

  1. Immature technology stack in some cases
  2. Lack of trained resources in Big Data to deliver that technology. Companies are still struggling to get adequate resources in.

So how do you make a Big Data Project a success? While how you manage your Big Data project can vary depending on your specific use case and your business space, there are a number of key steps to successfully implement a BDP and for the sake of brevity I have combined some together. These are as follows:

  1. Your Business Use Case
  2. The Project Plan and Scope
  3. The Constraints and the Critical Success Factors
  4. Assumptions
  5. Reporting
  6. Deliverables
  7. Establish the Technical Requirements and Technology 
  8. Create a Business Value Case

Your Business Use Case

Outline the tangible benefits that Big Data solution will bring. If you are a technical professional, you will need to:

  • Outline what Big Data can do for the business concerned in terms that someone with business background can buy into it. In other words, how the business users are going to interact with the Big Data solution to achieve a specific business goal.
  • Who are your business stakeholders and their roles? Business stakeholders normally have their agenda based on often other needs and these needs may create race conditions between stakeholders for their demands.
  • Create if needed a Force Field Analysis of stakeholders who are in favor of changes proposed by Big Data and those who show little or no interest in embracing this change. From my experience people do not normally like change and they are content with status quo. You need to take them on board. This is part of the classic Critical Success Factor
  • Prioritize Big Data use case taking into account different interests as discussed earlier
  • What are the obstacles to getting there in terms of business and technology stack? Case in point, can a technology stack like Spark Streaming provide an adequate business solution for Algorithmic Trading in place of the propriety software?
  • Manage the expectation. In some cases, Big Data may not be the answer to the problem and the solution may lie somewhere else.

I cannot emphasize enough the importance of a well-defined Business Use Case. You will need the Big Data goals outlined as clearly and as early as possible to avoid project failure.

The Project Plan and Scope

You need to define clear project plan and objectives here. Examples; to provide: Credit Risk needs, Insurance Risk Models, Fraud Detection, Real Time Data Analytics for a vendor’s tool, Sentiment Analysis and others

  • Specify expected goals in measurable business terms. The goal here is a statement of specifics that quantify an objective.
  • Identify all business criteria as precisely as you can determine.
  • More often than not, you may need to phase the project in line with the available resources and absorption of Big Data Technology.
  • Outline the need for Proof of Concept or Pilot Projects in advance.
  • Dependency on the external consultants factored in if any.
  • Create and maintain a project plan.

See also the Deliverables below

The Constraints and the Critical Success Factors

  • Allocation of adequate time and resources by the Stakeholders Team in order for BDP to succeed
  • Provision of adequate level of access to various resources including in-house staff, external consultants if any and Hardware such that an informed judgment at any part of the project can be made.
  • Agreement on the timescales
  • Resolution of all the technical and access matters.

Assumptions

BDP is normally a high profile assignment which may have long term consequences on the way the enterprise conducts its business. The following assumptions are made:

  • Resources will be available from all departments at key points during the lifetime of this assignment. These include all resource types
  • The Relevant Managers are required to make these resources available and committed to this assignment
  • This assignment will address the point whether it is possible to create a satisfactory Big Data Solution for the enterprise needs. To this effect we expect A BDP to take a high profile.
  • The commitment of the Senior Management to see this assignment through.

Reporting

As a consultant, I provide a comprehensive report upon the completion of the assignment. This report will form the bulk of the deliverables (see below). Additionally, a bi-weekly progress report needs to be provided to the stakeholders throughout the work undertaken. I find this adequate.

Deliverables

The deliverables have to be tightly coupled with the Project Scope such that BDP does not suffer from the Moving Target Syndrome so to speak. As a minimum:

  • What is in scope in detail
  • Equally import, what is not going to be delivered or out of scope
  • If the project is phased what are the clear deliverables for each phase and the interfaces between these phases
  • A comprehensive report highlighting the areas covered under the heading “Project Plan and Scope”
  • The milestones that impact the deliverables
  • The items that are on the critical path. For example, relying on a Big Data Vendor to deliver a given solution
  • Show stoppers if any

Establishing the Technical Requirements and Technology Stack

This section is more important in case of Big Data compared to other more established technology stacks. These include stacks from the Ingestion Layer up to and including the Presentation Layer (I assume that the reader is familiar with these Big Data layers. Otherwise please see my other articles). In short how you are going to collect data and how you are going to present that data as a service

  • The current tools available for Big Data Ingestion including the existing tools
  • The existing architecture for interconnecting individual silos
  • The volume of ETL and ELT involved
  • The existence of a Master Data Model (MDM) if any
  • External feeds etc.
  • The usual three Vs of Big Data namely; Volume, Velocity and Variety
  • These are shown in Figure 1 below

BigDataIngestion

Figure 1: Big Data Ingestion and Storage Layer

Data in Motion versus Data at Rest. If Data in Motion is important, for example Real Time Fraud Detection (anything below 0.5 seconds), would it be useful to use some form of Data Fabric for a timely action and alert as shown in Figure 2 below:

DataGridOnly

Figure 2: Data Grid Offering for Real Time Action

Your technical requirements will also have to cater for Big Data Analysis layer including the Analytics Engines and Model Management in order to provide business solutions as defined by the stakeholders as shown in Figure 3 below. For those interested in details please see my other articles here on Big Data.

BigDataAnalyticsOnly

Figure 3: Big Data Analysis Layer

The Consumption Layer is the Client Facing Layer that has to satisfy the Stakeholders needs. This is a very important layer as it offers Data in various format as a Service to the Client. This is shown in Figure 4 below:

BigDataConsumptionLayer

Figure 4: Big Data Consumption Layer

This is a layer that in which some investment is already been made and most stakeholders are familiar with this layer. In reality most medium and Large enterprises have already impressive proprietary  tools such as Oracle BI or Tableau for this purpose that hook to various Data Warehouses or Data Marts. From my experience a two-tier solution may work. This will allow the existing BI users to use the existing visualisation tools but crucially under the bonnet you may decide to move away from the proprietary Data Warehouse to Big Data. These considerations should be taken into account:

  • The performance. Whether you keep using the existing Data Warehouse or Big Data storage layer such as HDFS, you should provide comparable performance
  • Comparable Open Source tools. For example, you may deploy a simple Open Source tool like Apache Squirrel that allows SQL access to a variety of JDBC data sources including Big Data
  • Consider taking advantage of in-memory offerings such as Apache Alluxio or the new kid in the block LLAP
  •  Your technical users may want to delve into data above and beyond SQL type queries. In that case you may consider a notebook like Apache Zeppelin.

Create a Business Value Case 

Your Business Value Case needs to be in line with what the firm offers as its main line of business vis-à-vis deploying Big Data. In most cases this boils down to the tangible values that adding Big Data stack brings in. Very often this is not well established and very fluid in line of “let us see what we can get out of it for our business”. Your customers here are not IT stakeholders, but rather business stakeholders who are part of revenue generators and should use Big Data as insight for competitive advantage to make higher profits.

Think of it as this way. If Big Data Deployment is going to provide a favorable Return on Investment, you will need to quantify this one with a Cost/Benefit Analysis. A typical cost/benefit analysis should include the following:

Cost Assumptions

    •  Working days in a Month = 20
    • Uniform overall cost of resource per day. This varies from enterprise to enterprise but could be somewhere between £700 to £1200 per day.

Cost Calculation

  • Upfront investment cost
  • Implementation Cost over the period of the project
  • Ongoing maintenance cost
  • Licensing cost including Big Data vendor’s package and support
  • Internal Cloud Storage Cost if any. Most Financial entities operate in this mode
  • External Cloud Storage Cost if any. Small and Medium size companies may opt for Amazon S3 or Microsoft Azure
  • Training cost for new technology
  • New hardware cost
  • Other costs

Benefits

  • Faster and better decision making due to the availability of all resources in one place (Big Data Lake)
  • Open source cost adjustment
  • Sophisticated analytics can potentially improve decision-making process
  • Avenues for new products and services that were not available before
  • New information driven business model based on much better insight into the customers and competitors

Although some of the above points cannot not be converted into business values straight away, nonetheless they could be treated as a long term investment. Thus you have to consider the so called return on investment (ROI) value, in other words when you are going to see deploying Big Data is going to bring tangible profits.

Scalability

The business model evolves and so should your Big Data solution. You will need to build enough scalability to the Big Data Solution to allow painless expansion to support future expansion. This is no different from the classic I.T. solutions

Simplicity

Since my student days I have believed that there is a well-established scientific axiom namely “Simple is beautiful”. If a product offers simple solution for complicated problems, then it must be a well-designed product by definition. Like any other software build, there are two ways of constructing a Big Data solution. One way is to make it simple or simple enough that there are obviously no deficiencies, and the other way is to make it so complicated that there are no obvious deficiencies. The first method is far more difficult to achieve due to the current state of Big Data landscape which is crowded by often confusing and conflicting offerings.

Thus it is important to keep the technology stack as simple and robust as possible. For example, supporting two overlapping products at the same is a recipe for fragmentation and resource wastage.

Conformation to Open Standards

Your choice of open source product has to be based on sound decision taking into account:

  •  Enterprise Readiness. Simply put whether the product is ready for production deployment
  • The support within the community.
  • The stability of the product
  • The ease of maintainability
  • Whether the product is supported by a Big Data vendor
  • The potential risk associated with deploying a promising but little known product
  • The level of in-house expertise in supporting and acting quickly if there is an issue with any component of Big Data

Summary

I trust that I have provided some useful information here. Like anything else there is no hard and fast rule on how to implement a Big Data Project. Some will find to use some Structured method to implement it or other mortals like myself prefer a heuristic model based on one’s existing experience and knowledge. As ever your mileage varies.

Disclaimer: Great care has been taken to make sure that the technical information presented in this paper is accurate, but any and all responsibility for any loss, damage or destruction of data or any other property which may arise from relying on its content is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction

Comments»

No comments yet — be the first.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: