• Calender

    February 2016
    M T W T F S S
    « Dec   Mar »
  • Contact

    Send us press releases, events and new product information for publication..

    Email: nchuppala@ outlook.com

Is Big Data Still a Thing? (The 2016 Big Data Landscape)



I’m super excited to be involved in the new open source Apache Arrow community initiative. For Python (and R, too!), it will help enable

  • Substantially improved data access speeds
  • Closer to native performance Python extensions for big data systems like Apache Spark
  • New in-memory analytics functionality for nested / JSON-like data

There’s plenty of places you can learn more about Arrow, but this post is about how it’s specifically relevant to pandas users. See, for example:

Bigdata Landscape 2016


RIOT GAMES – Platfor Case Study

Riot Games is publisher of the mega-hit gaming phenomenon League of Legends. Since its introduction in 2009, League of Legends has grown a massive and loyal following, with more than 67 million playing every month. Experiencing massive growth, the company needed an analytics solution that would work well with their push-model data pipeline, one that would put analysis and exploration capability directly into the hands of the data analysts, and one that would provide fast and flexible visualization of all of their data. Platfora Big Data Analytics addresses all of these needs, while providing new and unexpected insights into the world of League of Legends
Market: Multiplayer Online Games
Industry: Game Publishing, Entertainment
Solution: Platfora Big Data Analytics
The Challenge:
• Unite game data and external web data in one analytics environment
• Provide self-service analysis and exploration
• Enable flexible, on-the-fly changes to analysis
The Solution:
Platfora Big Data Analytics has proved a perfect fit for Riot Games, working seamlessly
with their data pipeline and meeting the need for fast and flexible analytics results. Platfora provides a responsive, visual environment that unites player data and external data in new ways to provide insights that were not possible before.
The Results:
• Supports real-time updates to the game environment
• Provides whole new visualizations in an hour
• Enables end-to-end analysis of player activity


Information Visualization Research Projects that Would Benefit Practitioners


In a previous blog post titled “Potential Information Visualization Research Projects,” I announced that I would prepare a list of potential research projects that would address actual problems and needs that are faced by data visualization practitioners. So far I’ve prepared an initial 33-project list to seed an ongoing effort, which I’ll do my best to maintain as new ideas emerge and old ideas are actually addressed by researchers. These projects do not appear in any particular order. My intention is to help practitioners by making researchers aware of ways that they can address real needs. I will keep a regularly updated list of project ideas as a PDF document, but I’ve briefly described the initial list below. The list is currently divided into three sections: 1) Effectiveness and Efficiency Tests, 2) New Solution Designs and Tests, and 3) Taxonomies and Guidelines.

Some of the projects that appear in the Effectiveness and Efficiency Tests section have been the subject matter of past projects. For example, several projects in the past have tested the effectiveness of pie charts versus bar graphs for displaying parts of a whole. In these cases I feel that the research isn’t complete. Apparently, some people feel that the jury is still out on the matter of pie charts versus bar graphs, so it would be useful for new research to more thoroughly establish, more comprehensibly address, or perhaps challenge existing knowledge.

Please feel free to respond to this blog post or to me directly at any time with suggestions for additional research projects or with information about any projects on this list that are actually in process or already completed.

Effectiveness and Efficiency Tests

  1. Determine the effects of non-square aspect ratios on the perception of correlation in scatterplots.
  2. Determine the effectiveness of bar graphs compared to dot plots when the quantitative scale starts at zero.
  3. Determine the relative speed and effectiveness of interpreting data when presented in typical dashboard gauges versus bullet graphs (one of my inventions).
  4. Determine the effectiveness of wrapped graphs (one of my inventions) compared to treemaps when the number of values does not exceed what a wrapped graphs display can handle.
  5. Determine the effectiveness of bricks (one of my inventions) as an alternative to bubbles in a geo-spatial display.
  6. Determine the effectiveness of bandlines (one of my inventions) as a way of rapidly seeing magnitude differences among a series of sparklines that do not share a common quantitative scale.
  7. Determine if donut charts are ever the most effective way to display any data for any purpose.
  8. Determine if pie charts are ever the most effective way to display any data for any purpose.
  9. Determine if radar charts are ever the most effective way to display any data for any purpose.
  10. Determine if mosaic charts are ever the most effective way to display any data for any purpose.
  11. Determine if packed bubble charts are ever the most effective way to display any data for any purpose.
  12. Determine if dual-scaled graphs are ever the most effective way to display any data for any purpose.
  13. Determine if graphs with 3-D effects (e.g., 3-D bars) are ever the most effective way to display any data for any purpose.
  14. Determine which is more effective: displaying deviations in relation to zero or 100%. For example, if you wish to display the degree to which actual expenses varied in relation to the expense budget, would it work best to represent variances as positive or negative percentages above or below zero or as percentages less than or greater than 100%.
  15. Determine the effectiveness of various designs for Sankey diagrams in an effort to recommend design guidelines.
  16. Determine the best uses of various network diagram layouts (centralized burst, arc diagrams, radial convergence, etc.).
  17. Determine the effectiveness of word clouds versus horizontal bar graphs (or wrapped graphs).
  18. Determine which shapes are most perceptible and distinguishable for data points in scatterplots.
  19. Determine the effectiveness of large data visualization walls versus smaller, individual workstations.
  20. Determine if the effectiveness of displaying time horizontally from left to right depends on one’s written language or is more fundamentally built into the human brain.
  21. Determine if the typical screen scanning pattern beginning at the upper left depends on one’s written language or is more fundamentally built into the human brain.
  22. Determine the relative speed and effectiveness of interpreting particular patterns in data when displayed as numbers in tables or visually in graphs. For example, compare a table that displays 12 monthly values per row versus a line graph that displays the same values (i.e., twelve monthly values per line) to see how quickly and effectively people can interpret various patterns such as trending upwards, trending downwards, particular cyclical patterns, etc. We know that it is extremely difficult to perceive patterns in tables of numbers, but it would be useful to actually quantify this performance.
  23. Determine the relative speed of finding outliers in tables of numbers versus graphs.
  24. Determine the relative benefits of using a familiar form of display versus one that requires a few seconds of instruction. The argument is sometimes made that a graph must be instantly intuitive because making people learn how to read an unfamiliar form of display is too costly in time and cognitive effort. For example, population pyramids provide a familiar way for people who routinely compare the age distributions of males versus females in a group, yet a frequency polygon, although unfamiliar, might provide a way to see how the distributions differ much more quickly and easily. In cases when people can be taught to read an unfamiliar forms of display with little effort, does it make sense to do so rather than continuing to use a form of display that works less effectively.

New Solution Designs and Tests

  1. Develop an effective way to show proportional highlighting, as it pertains in brushing and linking, for portions of the following graphical objects: bars, lines, and boxplots. Various ways to show proportional highlighting have been applied to bar graphs, but not to line graphs and box plots.
  2. Develop a way to automatically attach data labels to the ends of lines in a line graph without overlapping.
  3. Develop a way to temporarily overlay or replace box plots with frequency polygons.
  4. Develop a way to automatically detect the amount of lag between two time series and then align the leading events with the lagging events in a line graph.
  5. Develop potential uses of blindsight to direct a person’s attention to particular sections of a display as needed (e.g., to something on a dashboard that needs attention).
  6. Develop a effective design for waterfall graphs when multiple transactions occur in the same interval of time and some are positive and some are negative.
  7. Develop an algorithm for automatically distributing several sets of time series values uniformly across a 100% scale when they have different starting points, ending points, and durations. For example, this would make it easy to compare the person hours associated with various projects across their lifespans, even when they differ in starting dates, ending dates, and durations.
  8. Develop a full set of interface mechanisms for making formatting changes to charts (turning grid lines on and off, changing the colors of objects, repositioning and orienting objects such as legends, changing the quantitative scale along an axis, etc.) that involves direct access to those objects rather than one that requires the user to wade through lists of formatting commands located elsewhere (e.g., in dialog boxes).

Taxonomies and Guidelines

  1. Develop a useful taxonomy or set of guidelines to help people think about the differences in how data visualizations should be designed to support data sensemaking (i.e., data exploration and analysis) versus data communication (i.e., presentation).

Take care,


Leadership in context: Organizational health matters more than you might expect.