• Calender

    February 2013
    M T W T F S S
    « Jan   Mar »
  • Contact

    Send us press releases, events and new product information for publication..

    Email: nchuppala@ outlook.com

Big Data Innovation Summit

The Big Data Innovation Summit taking place at the Westin St. Francis in San Francisco is only 7 weeks away.

This summit is the largest gathering of Fortune 500 Big Data executives to date. Join 1000 of the industry’s top minds for two days of keynote presentations, interactive workshops, panel sessions & networking with pioneers in the data science field.

Across this summit, speakers include:
– Data Science and Analytics, Facebook
– Team Lead, Mobile Data Science, LinkedIn
– Principal Data Scientist, eHarmony
– Senior VP, Big Data, Comcast
– Principal Scientist, NASA
– Chief Technology Officer, Dept. of Health & Hospitals
– Vice President, R&D Information, AstraZeneca
– Chief Data Officer, NYSE
– Division Director, Dept. of Defense
– Vice President, Data, Paypal
– Chief Information Officer, Big Fish
– Chief Medical Officer, GE Healthcare
– Chief of Information, Dept. of Commerce
– Director, Decision Science, Barclaycard
– Vice President, Strategy and Engagement, Salesforce
– Director, Database Development, San Fran Police Dept.
  & many more…
View the full speaker line up.
With places filling fast, please contact Robert Shanley if you would like to secure your pass.
Can’t make the summit? Join the Big Data Executive Breakfast as part as the Big Data Innovation Week in San Francisco, April 10.

What Do Companies Do With Analytical Data?

Posted on February 18, 2013 by 


In a recent chat with 1010data, the company provided us with some interesting stats on some of its customer behavior. If you are not familiar with 1010Data, they are a SaaS-based big data analytics company. The company cut its teeth by providing big data analytics capability to the financial sector, long before the “Big Data” marketing term was invented – and has since moved on to acquire customers in telecoms, retail, gaming, the health sector and other areas.

When we mention big data in respect of 1010data, you can think in terms of tables with tens of billions of rows of data being manipulated/analyzed as if they were in a spreadsheet, with a wide range of statistical, text and other functions being applied to them. A significant aspect of the company’s capability is that such operations are carried out very quickly and hence the data analyst works in an interactive manner with the data set he or she is interested in.

What Do Companies Do With Analytical Data?

Here’s what interested us in our conversation with Sandy Steier of 1010data:

1. One of the company’s customers stores and works on a table that has about half a trillion rows of data. That is a massive amount of data to analyze at one go. This may set some kind of record or it may not – who’s to tell? But it certainly qualifies as genuinely big data. This single statistic speaks to the likelihood that there is probably no size of table  that a company would not want to analyze interactively if it could. In our view interactive analysis is the ideal for a data analyst (if only those goddam computers weren’t so impossibly slow).

2. The average number of tables that 1010data’s customers actually hold is somewhere in the region of 2000 – 3000. Think about that. Even if some of those tables are discarded tests or no longer active, that’s a really big number. And since 1010data customers pay by usage there’s probably an incentive for customers not to be profligate with the resources they consume. So even if half of those tables could be archived or deleted, that is a large number of tables.

In our view, this speaks to the fact that, among 1010data’s customers at least, the amount of data analysis that is going on is extensive, occupies a good many staff and covers many aspects of the business.

What we are beginning to see emerge can be thought of as “The Evidence-Based Business.” Businesses who operate mainly on the basis of analytical intelligence certainly qualify for that title. Such businesses are not entirely new; insurance companies have operated that way for many years and, more recently, so have trading banks. Google, to some degree, pioneered this mode of operation, and other web-based businesses have imitated it. But now we see this spreading to many other sectors.

A final point that emerged from our breifing with 1010data is that some of its customers are sharing their data in a commercial manner, by renting access to partenrs. One of 1010data’s customers actually runs its data warehouse – hosted entirely by 1010data – at a profit due to the revenue it receives from its data sharing arrangements.

This is an interesting “straw in the wind.” The direct market for data has existed for a long time and has experienced very little innovation. This kind of operation is, as far as we know, new and could usher in a distinctively different way to profit from data. This may be an opportunity that many companies could seize.


Big Data News and AnalyticBridge

Featured on DSC, Big Data News and AnalyticBridge

BYOD: Reaching the Peak? Or Just Getting Started?

In the “2013 Mobile Workforce Adoption Trends,” Forrester VP and Principal Analyst Ted Schadler collected and assessed survey responses from 9,766 information workers at SMBs and enterprises in 17 countries, including the U.S., U.K. and Canada.

People using three or more devices – primarily personal computers, tablets, smartphones and laptops – took up 29 percent of the global workforce last year, an increase of 6 percent from 2011, according to Forrester. This group of highly mobile and connected workers is expected to continue strong growth and “top out” at approximately 50 percent by 2017, according to Schadler’s estimates. Driving that growth is the global use of tablets, from a base of 210 million tablets in use last year to more than 900 million worldwide within five years, with many of those tablets used at least in part for business. In essence, Schadler says, “the more IT provisions, the more likely people are to use them in multiple locations.”

“Basically, all the people that can work in multiple locations, will,” says Schadler.

Of the information workers in the survey, one-quarter globally participate in bring your own device work practices in some fashion, according to Forrester. How BYOD will “peak” depends on your definition of employee devices. As Forrester and Schadler put it in the report: “If you define BYOD as employees paying, then we may be reaching the crossover point where your company pays for more smartphones than employees do. But if you define BYOD as employees choosing the devices they need for work, we are just getting going.”

In terms of enterprises paying for devices picked by mobile employees, Schadler expects BYOD to reach its peak within the next few years. Schadler already sees this BYOD trend maxing out in some instances in the leading edge sales reps for pharmaceutical, manufacturing and financial services. More tricky is how controls over those devices continue to play out. At the core of the governance and security issues with BYOD, Schadler sees organizations at the crossroads of a situation where employees don’t want IT control, while IT requires more control, though they are pressured by the number of applications and investments. 

Justin Kern is senior editor at Information Management and can be reached atjustin.kern@sourcemedia.com. Follow him on Twitter at @IMJustinKern.



Operational Intelligence – Is This the End Game for Big Data?

Posted on February 1, 2013 by 

You could say that for the past 20 years we talked the data analytics talk but didn’t really walk the data analytics walk. It isn’t that companies didn’t do data analytics; they did. Many people were employed as data analysts or used data analyst tools as part of their job. You could find them in pharmaceutical companies, banks, insurance, big retail and, once the web took off, they were attacking web log data with their mathematical tools.

In truth though, it wasn’t until Google that we had a company run by data analytics. And let’s not kid ourselves on this: intelligent business innovation drove much of what Google did, but data analytics drove a good deal of their activity and innovation.

Boiling Down Big Data

Data didn’t suddenly become “big,” it just hadn’t been analyzed before or, in some instances, it hadn’t been analyzed in depth. So the advent of easily deployed public cloud resources or easily manageable private cloud resources, plus the inexpensive Hadoop stack, created an opportunity for data analysts to work on data sets that they hadn’t examined before. As was likely, some of that data unearthed valuable knowledge.

In part, this involved large volumes of data, high velocity data or awkwardly structured data, but the general point was that analytics increased in business importance. While not every piece of knowledge needed to be exploited immediately, some did indeed require quick action.

Thinking of the broad business intelligence (BI) market, we can look at this in terms of the four categorizations of BI that we use here at The Bloor Group: Hindsight, Oversight, Insight and Foresight. The first two, hindsight and oversight, are fairly well exploited by many companies via regular reports, dashboards, OLAP capabilities and varieties of data visualization. The new data sources that companies are now exploiting can be fed into established hindsight and oversight capabilities quite easily.

Most of the big data action lies in the areas of insight and foresight (deep analytics and predictive analytics), and some of the knowledge that is being discovered needs to actioned swiftly. Speed is a prime factor.

To point to the obvious: the cost of fraudulent activity diminishes the sooner the activity is detected and stopped. The same is true of a network security breach or some risk factor in the financial market. Further, a characteristic of valuable information (intelligence) is that its value tends to degrade over time, either because the information is shared or because competitors also discover the information. So the trick is not just to unearth such information, but to act on it as fast as possible.

The Rising Tide of Operational Intelligence

Operational intelligence is, we believe, beginning to take off. For one thing, we see more and more vendors using this term to describe their technology. Such vendors all have one thing in common, irrespective of what their other capabilities are: they seek to transform business intelligence into business action at real-time or near real-time latencies.

The business intelligence we are talking about here is coming mainly from data analytics or predictive analytics. What we mean by business action is that the intelligence is presented either to a user for immediate action or delivered as a trigger to software which takes action automatically.

Arguably, such operational intelligence applications have existed for quite a time. Banks have been automating trades based on smart algorithms for years. But a general set of software capabilities that can feed intelligence directly to the point of business action is fairly new.

In our view, there is a rising trend here that’s very likely to take off this year.

Does your company provide Operational Intelligence software or services? If so, please request a briefing so that we can provide a detailed overview of your offering in our OI Market Roundup, to be published this March.

Component-Based Analytical Development – From ETL to Analytical Workflow

Posted on February 8, 2013 by 

It’s now well over 20 years since the era of data warehousing and business intelligence began. From the very outset, other than database management systems (DBMSs), the one type of technology that has been there from the start is data management in the form of data cleansing and data integration. We started out with the 3GL code generator ETL tools and quickly advanced to graphical dynamic workflow-based tools that made it much easier to define how we wanted to transform and integrate data before loading it into target data warehouses. Initially data cleansing tools were separate from data integration tools until the emergence of the service-oriented architecture (SOA) era that made it much easier to combine cleansing and integration services and bring them together as part of the same platform. This resulted in technologies merging and triggered deep integration through the sharing of a common metadata repository for cleansing and transformation policies.

Matt Madden of Alteryx will brief author Mike Ferguson in The Briefing Room on February 12, 2013. Register today.

SOA also allowed BI vendors to re-architect their BI tools, breaking them up into components such as a security service, a query engine service, a charting service, analytical services, rendering and presentation services, etc., such that when these were put back together again, complete service-oriented BI platforms emerged with visual portlets and service-oriented tools. From there the dashboard arose and the ability to mix and match visual reporting and analytical services to build your own dashboard.

Meanwhile in the data mining market, workflow was also alive and well. I can remember working for Integral Solutions Limited in the UK in the mid-1990s. You may remember them as the creators of the Clementine data mining tool, which was bought by SPSS and is now part of IBM’s Business Analytics and Optimization division. From the outset, Clementine had workflow to prepare data for analysis during statistical and predictive model development. SAS also had workflow as did ThinkAnalytics and other analytical model development tools. Once SOA emerged, it was not long before these tools also added support to call web services from within their workflows or indeed to publish whole analytical workflows as services that could be invoked by other applications and tools.

Putting this all together, it is not surprising to see that what has emerged is a component-based world where we not only have the ability to build data cleansing and integration workflows on a single platform, but to build full-blown service-oriented analytical workflows that do everything from data capture, cleansing, integration, analysis, charting, data visualization and rendering. The addition of functionality like team development means that component libraries can be formed with components classified and stored in a self-describing way (e.g., XML) to make them easy to find and re-use. Besides the self-service BI data discovery and visualization tools, it is this kind of component-based rapid development capability that makes it possible to advance to where we are today – the era of self-service BI for business analyst analytical producers and business analyst analytical consumers.

So far, all of this discussion has been on based on the assumption that the data being analyzed is structured data, i.e., where database schemas are known and where metadata is shared and exploited by multiple tools.

The emergence of Big Data, however, has caused disruption to this mature analytical world. New data sources and new data types are now high on the priority list of many businesses to be analyzed. However this new multistructured data has presented us with a problem. That problem is one where the schema of the data is often not known. With data such as text, images, video, etc., it is often the case that batch analysis (often using the MapReduce programming framework) needs to be done to extract or aggregate high value subsets of it that could be mapped into a schema or brought into a data warehouse for further analysis. In addition, we have seen the emergence of new NoSQL analytical data stores such as Hadoop and graph DBMSs to store and in some cases analyze this data. These new platform data stores are not relational DBMSs. In that sense, unlike RDBMSs, they have no optimizer, proprietary APIs and very little in the way of metadata to tell you about data stored in files or other structures in these data stores. Therefore analyzing data in these systems is more complex. Developers have to work out what the structure of this data is in order to parse and analyze it. Furthermore, the data is not available for concurrent query access in the same way it would be in a relational DBMS. Each new query can warrant a new batch program.

One question that arises is whether or not modern analytical workflow tools, which are so useful on traditional data warehouses and data marts, can be extended for use on Big Data. It’s not all that straightforward when the data source has no schema. You could take advantage of RDBMS technology that allows you to develop and run so called “polymorphic table functions” that can reach into Hadoop and other NoSQL data stores to run MapReduce programs or exploit SQL-like interfaces such as Hive to generate these kind of applications. Alternatively, data scientists can develop batch MapReduce applications first to analyze and extract value out of multistructured, schemaless data sources and then make these jobs available for other tools to use. In this latter case, as long as metadata exists to discover what MapReduce jobs are available on Big Data platforms like Hadoop, then analytical workflow platforms may be able to make use of the “appropriate job” to analyze new types of Big Data during workflow execution. Both techniques are emerging. Data management tools have also been extended to move onto platforms like Hadoop to clean and parse multistructured data. Similarly, several workflow- based analytical tools are also running in this environment, e.g., Pervasive RushAnalyzer.

I would therefore predict that it is inevitable that, one way or another, analytical workflows will start to work across multiple analytical platforms to solve more complex analytical problems. Given that these tools can also publish workflows as services, we are now entering an era where on-demand multiplatform analytical workflows will become available to be invoked by front-end tools and dashboards to deliver more timely business insight to new, more challenging business questions.

About the Author: Mike Ferguson is a UK-based independent IT Analyst and Consultant with over 30 years of IT experience specializing in Big Data, BI/Analytics and Data Management. He can be reached at mferguson@intelligentbusiness.biz or at www.intelligentbusiness.biz.


Big Data Analytics: A Wise Career Option?

By SiliconIndia   |   Thursday, 31 January 2013, 02:34 Hrs


Bangalore: It can be argued that IT has seen better years. Times are hard for sales of hardware and the slow economic recovery has resulted in slow growth and investment in all areas, states Shaun Nichols on V3.co.uk. This is reflecting in the high unemployment rates, especially for recent graduates.

However there is a shimmering ray of hope through all the gloom; a field that has many jobs available and is seeking graduates who are capable of handling requisite platforms- big data analytics. With data being produced and consumed at an alarmingly fast pace, big data is the need of the hour and vendors are creating machines that can process the mountains of data. These mountains of data contain vital information such as interactions on social sites and e-commerce transactions. Executives and Strategy makers are grappling with this quantity of data and trying to understand what insights they should glean from it.

With this Big Quantity of data come big opportunities. Companies can seek insights on a far more detailed scale and gain much more insights. A common database cannot analyze all this data as most of it is unstructured. Hence the propagation of big data analytics. Now Big data can be processed by NoSQL and the Open Source Apache Hadoop option. These platforms will also necessitate specialized talent and expertise. Opportunities are aplenty for data analysts and the new generation can make use of the vacancies in this field and receive training to manage big data software. The explosion of multiple products in the market are a testament to the fact that vendors are seizing the opportunity to release big data related software and hardware. However, the education domain has not been able to keep up until recently. Universities are now waking up to the need for big data related curriculum. Coursera, which offers free online courses, will hold a course soon on Web Intelligence and Big Data. Teradata has also recently announced free certification for candidates seeking analytic jobs. EMC too has been active in sponsoring certifications for the BI sector. Until recently vendors were directly reaching out to under grads and honing the skills required to handle big data. Gartner had also predicted that by 2015, 4.4 million IT jobs will be created globally to support big data.

The careers in BI could be extremely lucrative and graduate students ought to consider it as a specialization for post graduate study as the growth seems tremendous.