• Calender

    July 2013
    M T W T F S S
    « Jun   Aug »
  • Contact

    Send us press releases, events and new product information for publication..

    Email: nchuppala@ outlook.com

  • Advertisements

Three Data Categories Likely Missing in Your Data Warehouse


Guest blog post by David Mould, intially published in the IBM Data Magazine.

Both data warehouse managers and data scientists should jointly evaluate these potential sources of useful predictors

Three data categories likely missing in your data warehouse

Most companies collect data from several sources, such as past transactions, clicks, shipments, and more. The data scientist’s job is to transform this data into business intelligence and then into actionable results. Most of this data falls into one category of data: internal and easy-to-collect. The three other categories of data—internal and difficult-to-collect, external and easy-to-collect, and external and difficult-to-collect—are usually not in the data warehouse (see Figure 1). And some of the most important predictors are in these three categories.

Figure 1: Data that can provide some of the best predictions is often missing from the data warehouse.

Let’s look at the three missing categories of data and the reasons why they are sometimes missing from your warehouse.

Internal and difficult-to-collect data
This category of data is usually available within your company but is not in the data warehouse because it is generated infrequently or is sensitive in nature.

  • Brutally frank competitor assessment: Everyone has competitors, and most companies have a key performance indicator (KPI) grid that rates or compares each competitor. Usually this information is made available just to the C suite and to the board of directors. But as a data scientist, your predictions and forecasts are affected by competitor actions. So it would be valuable to access this information.
  • Surveys of former customers: Most internally published customer surveys are biased toward the loyal customers that like you. The best surveys are those that survey the customers that dumped you. Only those surveys will reveal what really needs fixing. If you don’t know what is broken from the customer viewpoint, it is more difficult to reconcile your actual results to the predicted results.
  • Unbiased focus group findings: Most focus groups draw from current, loyal customers. To get an accurate assessment of the product or service to be evaluated, insist on including three other types of customers: (1) customers who switched from a competitor to you; (2) customers who switched from you to a competitor; and (3) customers who have always been with a competitor. Listening to their interactions will provide a less-biased assessment.


External and easy-to-collect data
This category of data is external to your company and is usually not in the data warehouse because no one has requested it yet.

  • Government data on business conditions, statistics, and trends: External factors can have a major impact upon your business, so they should be tracked over time. The government collects and posts unemployment rates, business cycles, census demographics, and other data that can be downloaded and added to your data warehouse.
  • Consumer reporting from TransUnion, Equifax, or Experian: Some organizations are finding that their predictive models can be enhanced with consumer credit scores. Since there is a per-score charge, an ROI analysis should be completed to determine if the benefits (score uplift) outweigh the cost.
  • Consumer and business data from Acxiom, Dun & Bradstreet, Harte-Hanks, and others: Marketing firms can offer a wide range of valuable data on your customers. An ROI analysis should be completed to determine if the benefits of using this data (score uplift) outweigh the cost.
  • Consumer and business survey data from Gallup, Forrester Research, and others: Surveys are a great source for forecasting, especially when you compare new survey data with past survey data. This information usually isn’t in the warehouse.


External and difficult-to-collect data
This category of data is also external to your company and is usually not in the data warehouse because it is generated infrequently.

  • Expert opinions: Sometimes the best way to make a prediction is to use an expert’s opinion. If an expert’s opinion has been somewhat accurate in the past, go ahead and use that person’s opinion as a dummy variable.
  • Published survey or trend data that needs to be scanned or typed into the database: Some data is only available in hard copy. If it is difficult to enter the data into the warehouse, then it probably won’t be there. You may have to scan or type it in manually. But it could be the missing independent variable that you have been looking for.
  • Recent technology changes: Recent technology changes could have a profound impact on your business in the future and need to be tracked and modeled accordingly.
  • Executive interviews: Trade journals and magazines sometimes include executive interviews. Your internal experts can pick up on key words and phrases to divine direction and trends.
  • Industry expert and supplier feedback: Industry experts and suppliers can provide key information or a viewpoint that you overlooked. Take advantage of their years of experience.


Tapping into valuable data beyond your existing warehouse
Having millions of data records and hundreds of fields is great only if the data is useful. Some of the most useful data—which can provide the best sources of insight—is difficult to collect, external to the organization, or both. Collecting and incorporating this data into your data warehouse will be worth the effort since it can provide new predictors that can boost your accuracy to a new level.

What do you think? Let me know in the comments.


Understand These Three Estimating Concepts

METHOD 123: empowering managers to succeed

Estimate in Phases

One of the most difficult aspects of planning projects is the estimating process. It can be hard to know exactly what work will be needed in the distant future. It can be difficult to define and estimate work that will be done three months from now. It’s harder to estimate six months in the future. Nine months is even harder. There is more and more estimating uncertainty associated with work that is farther and farther out in the future.

A good approach for larger projects is to break the work into a series of smaller projects, each of which can be planned, estimated and managed separately with a much higher likelihood of success. From an estimating perspective, the closest project can be estimated more precisely, with the subsequent projects estimated with a higher level of uncertainty. When one project completes, the next project can be estimated with a higher degree of confidence, with estimates refined for the remaining projects. This technique also provides checkpoints at the end of each project so that the entire initiative can be revalidated based on current estimates to ensure that it is still viable and worth continuing.

Estimate Fixed Costs and Variable Costs

You may hear the terms fixed and variable cost when you are estimating the cost of a project. Variable costs are those that change relative to how many units are being used. An obvious variable cost on a project is contract labor. The more hours you use from a contactor, the more the cost to the project. The cost of contract labor is variable depending on the number of hours worked.

Fixed costs are those that are basically the same for the project regardless of the resources being used. For instance, if you were building a house, the cost of the lot would be fixed and would not change based on the size of the house you built. Similarly, if you outsource part of a project to a third party for a fixed price, it becomes a fixed cost to the project as well. Even if the work takes longer or shorter than estimated, your project cost should still be the agreed upon fixed cost.

Estimate Time-Constrained and Resource-Constrained Activities

Activities can be classified as time or resource-constrained based on whether the duration can change if more resources are applied. An activity is resource-constrained if the duration changes based on the number of resources applied. For instance, you might estimate that it will take 80 hours for one person to build a roof on top of a house. If the person worked forty hours per week, it would take two weeks to complete the job. If you applied two people to the job, the effort is still be 200 total effort hours, but the job would only take one week to complete.

On the other hand, if an activity is time-constrained, the duration remains the same regardless of the number of resources applied. For example, lets say one person attends a three-day class. If you send two people to the class, the class does not get shorter; it still takes three days. Likewise, the time it takes for concrete to dry, or to mail a letter, is not impacted by the number of people involved. They just take a certain amount of time. If you find that applying resources has no impact on the project duration (or very little impact), then the activity is time-constrained

Free for Life” edition of Lavastorm Analytics Engine




“It almost makes people a little shocked that we can get the information that quickly. Things that used to take us a long time, we can go do very quickly – almost instantaneously at times.”- Lead Analyst, CenturyLink


If you want to use data and analytics to improve your business, I’d like to setup an exploratory discussion to discuss how we can give you instant access to the data that can drive your business.  Our Lavastorm Analytics Engine breaks down data silos, giving business users the ability to acquire, integrate, and analyze data 10 times faster than traditional tools such as SQL databases, Excel, Access, and older BI solutions.  It literally frees data previously locked in multiple sources so that you can use it to drive decisions, uncover root causes, and reveal new insights.  As a first step, I invite you to read a brief paper, “Breaking Through the Analytics Limitations of Access and SQL” and try our Free for Life software  yourself.

Then, if you want to learn more about gaining additional value out of your data, contact me to setup a 15 minute call/demonstration.  We can help you:  

  • Respond to information requests more quickly  – our users can not only unify data 10x faster than before, they can also create extremely precise filtering logic, enabling them to  very quickly answer difficult or unexpected questions 
  • Handle more data volume – organizations rely on our software to analyze business or performance daily, in some cases processing billions of records each day
  • Improve visibility to data – with our visual tools you can get all your data in one environment and instantly see how data was changed or used so you can easily debug complex data manipulation
  • Reduce scripting hassles – you can reduce SQL or other scripting by assembling and configuring analytic logic using our graphical building blocks

We’ve found that people need to see our software to believe it.  So, after you have read the paper, please either try the “Free for Life” edition of Lavastorm Analytics Engine or contact me with any questions. I’d be happy to setup a 15 minute exploratory call and explain how you can benefit from the Lavastorm Analytics Engine.

Kind regards,

John Joseph
VP, Product Marketing

Lavastorm Analytics

A new, agile way to analyze, optimize, and control your data and processes.


Transform customer experiences and relationships: Three disruptive forces combine for breakthrough innovation

Sponsored by: Hewlett-Packard Company
By now, nearly all companies have some sort of business intelligence (BI) and information management (IM) strategy in place, and yet few have realised the full potential of either, due to data silos, complex analytics tools, and changing business needs. Especially for customer-focused companies, that is a lot of data going to waste.
However, there is hope – by embracing three disruptive forces of data management, decision support, and agile services, you can finally unlock the value inside the deluge of data and improve customer experience and relationships.

In this helpful resource, explore how you can leverage each of these three concepts to better manage the customer information lifecycle, improve the customer’s sentiment about your organization, and turn customers into advocates.http://searchdatamanagement.bitpipe.com/data/loadAsset.action?resId=1372050181_505 

BI Framework 2020