• Calender

    June 2015
    M T W T F S S
    « May   Jul »
  • Contact

    Send us press releases, events and new product information for publication..

    Email: nchuppala@ outlook.com

Debunked! 9 myths about big data and Hadoop



Myth No. 1: You can get data scientists

Recently, a presales engineer at one of my company’s partners mentioned how much trouble his firm had finding data scientists. I asked about the qualifications his company was seeking. Well, they need to have a doctorate in math, a background in computer science, and what amounts to an MBA, not to mention actual work experience in all of those fields. I asked, “How old is this person, 90?”

Here’s what actually exists:
•Good mathematicians who write crap Python and often need the business stuff spoon-fed to them
•Good computer science people who understand some math
•Good computer science people who understand business after working enough problems
•Business types who understand math
•Subject matter experts
•Leaders who know how to get these people to work together

Because that company could not find this data-scientist unicorn, it had to create a working group with a cross-section of expertise. This is in fact what you have to do.

Myth No. 2: Everything is new

Technologists like to throw away the past, preferring tools that are new for what they claim is a totally new reality or problem set. That’s rarely the case.

For example, the Kafka message broker is portrayed as a big-data-needs-a-new-tool product. But compared to other message brokers, it has a pretty poor feature set and is immature. What’s actually new (meaning different): Kafka is architected for the Hadoop platform and with massive distribution in mind. That could be useful, if you can accept its flaws.

That said, sometimes you need more sophisticated routing and guarantees. Use ActiveMQ or a more robust option for those situations.

Myth No. 3: Machine learning is what you need

I estimate that about 85 percent of what people call machine learning is simple statistics. Most of your problems are probably simple math and analysis. Start there.

Myth No. 4: You are special

As the philosopher Dirden once said, “You are not special. You are not a delicate and unique snowflake.” Guess what? About half of the industry is busy writing the same ETL scripts for many of the same data sources and custom-creating the same analysis. Hell, in any sizable company, many departments probably are duplicating this work as well.

Needless to say, it’s a good time to be a big data consultant.

Myth No. 5: Hive is fast

Hive is not fast. It cannot be made to impress you. Yes, the new version is better, but it will still underwhelm you from a performance perspective. It scales well, but you may need multiple tools in your chest to hit Hadoop with SQL.

Myth No. 6: You can use clusters with fewer than 12 nodes

Hadoop 2+ barely fits on 12 nodes — anything less and you will wait forever for it to even start. Plus, anything you run will complete in cricket time, if at all. (Well, you can run “hello world” on 12 nodes.) Hadoop 2 runs more processes, which means you need more nodes and more memory.

Spark will do better minus the load time from HDFS so long as the data set fits in memory.


5 Reasons Your Storage Snapshots Aren’t Working

Myth No. 7: Virtualization is a solution for your data nodes

Your vendor told you no. Your IT team balked. No, you cannot put data nodes on your SAN. But If you put your management nodes in VMs, you could bottleneck if writing the logs and any journals hit latency, or you get low IOPS or high latency to the data nodes.

That said, Amazon Web Services and others navigate these issues and still manage reasonable performance and scalability. You can too, but you need to distinguish this from your internal file servers and your external corporate presence site, as well as manage hardware and virtualized resources effectively.

Remember: Throughput and latency are orthogonal. HDFS cares about both in different places.

Myth No. 8: Every problem is a big data problem

If you are matching a couple fields against a couple of conditions across a couple of terabytes, it isn’t really a big data problem. Don’t treat every analytics need as a big data effort.

Myth No. 9: You don’t have big data

Although big data is about, well, working on huge sets of data, big data approaches can be quite useful on small data sets, too. So don’t ignore budget approaches when working with small data. You could have mere gigabytes of data and still benefit from Hadoop or other big data technologies, depending on the problem.

You could also have big data that you don’t know about. There are a lot of data sets that companies are accustomed to discarding, but could be useful. Any company with 50 or more employees probably have a big data issue somewhere — even a smaller company will if it manages enough assets (financial or otherwise).

Overcoming the Hadoop Talent Shortage in the UK

February 19, 2015

A recent Computing article discussed the growing need for Hadoop and other “big data” skills in the UK. In 2014, Computing found that the skills gap for Hadoop was one of the biggest in the big data spectrum – 21% of respondents are either considering using the software or already using it, but only 8% of organizations having the required skills in-house to fully exploit the software. Meredith Amdur, President and CEO of WANTED commented, “Hadoop has been the buzzword for big data expertise in managing and crunching large data sets to get to actionable insights for a while. But, that’s led to an incredibly tight market for that all-critical skill set.” Computing used our data to discover there were over 200 companies currently hiring for this talent in the UK, with Barclaycard, Deloitte, Ernst & Young, PwC, Goldman Sachs, Oracle, Accenture, Facebook, and HP having some of the most open Hadoop jobs in the UK. It’s not just Information technology companies that are in the hunt: marketing, sciences, and business research occupations in the UK also request those same skills.

Based on data in WANTED Analytics, Manchester may be the best place to look for “big data” talent in the UK, thanks to presence of companies like EMC, Ernst & Young, and Oracle, who have hired historically in this location. There are currently 900 people working in jobs that require Hadoop in Manchester. These and other companies have built a pool of candidates that you can use.

Recruiting Profile for Hadoop Skills in Manchester, UK

If you’re hiring for Hadoop in the UK, you may want to consider sourcing candidates in Manchester since it’s significantly less difficult to find and attract candidates. Although this would require relocating a candidate, you could actually reduce your time-to-fill and cost-per-hire by sourcing in these easier, quicker recruiting conditions. Recruiters should research some companies in this area to see what kind of salary, benefits, and company culture is provided so you are knowledgeable about what you may have to offer in order to lure candidates away from their current employer.

Read the complete Computing article about Hadoop hiring. (Please note, free site registration may be required to access the article.)

How Big Is the Hadoop Skills Gap?

March 26, 2015

According to a recent Forbes article by Ron Bodkin from Think Big, Mike Gualtieri, an Analyst at Forrester, predicted that the Hadoop skills gap will disappear in 2015. However, Bodkin strongly disagreed with Gualtieri. Bodkin said, “Hadoop requires fundamentally different modes of working with data management, distributed systems, and software engineering that take time to master. There are gaps between the skills people have, the perception of what is needed, and the reality of putting Hadoop to work.” However, he does feel that the Hadoop skills gap will eventually disappear as tools mature, patterns emerge, and more people gain experience. Just how big is the current Hadoop skills gap? Will the current candidate supply be able to support increasing demand?

In January and February, demand for Hadoop in the US was up 12% year-over-year, according to data from WANTED Analytics. Currently, there are about 10 qualified candidates in the labor market for each unique job ad. However, the available candidate supply is subject to change depending on the specific job requirements and location.

3.13.15 Jobs Requiring Hadoop Skills

As Bodkin mentioned, Hadoop is used in a variety of ways. Hadoop skills are most commonly required in the jobs listed above. Web Developers have the lowest number of qualified candidates with Hadoop knowledge, just 1 per ad. Software Developers (Applications), Computer Systems Engineers, and Information Technology Project Managers each have just 2 candidates per job ad. Recruiters filling these jobs are likely to encounter tougher recruiting conditions, while candidates with this knowledge will have more opportunities to consider.

Bodkin suggests that demand is rising as fast as candidates are learning skills associated with Hadoop, which is preventing the gap from being filled. Others have also taken notice of the growing demand and the lack of adequately skilled Hadoop talent. In response, some Hadoop distributors are offering opportunities for more candidates to learn Hadoop. MapR Technologies is providing a free on-demand training program. Cloudera and Hortonworks also offer Hadoop training.

If you’re recruiting for Hadoop experience, consider candidates with the other skill sets you require. Look for talent that is interested in or somewhat knowledgeable about Hadoop, and enroll employees in Hadoop training courses. Also, ensure that your salary is on par with or above the market rates for the occupation you’re filling combined with Hadoop skills. With a limited candidate supply, many employers may be increasing what they’re willing to pay talent that’s hard to find.

Get a free trial of WANTED Analytics to examine Hadoop demand and candidate supply in your market

IT Skills That Are Declining in Demand

February 11, 2015

We often discuss skills that are growing in demand – but what about skills that are declining? Today, we looked at the 10 IT skills that are experiencing the greatest year-over-year declines in hiring. While these skills may not become obsolete anytime soon, there may be new technologies and platforms emerging that will take the place of these more seasoned skills.

Declining IT skills

Over the past year, Oracle procedural language (PL/SQL) saw the greatest decline in demand. There were 33% fewer jobs in January 2015 than there were during January 2014. Some of the companies that decreased their hiring the most for these skills were Ageatia Technology Consultancy Services, Computer Sciences Corporation, and Lockheed Martin. Database design declined 32% in demand compared to last year. It’s surprising to see database design decline with the growth of “big data” and data management topics. However, “database design” is a broad term, and many jobs likely reference specific skills, like Hadoop which is increasing in demand. Employers may be getting more particular in the skills they recruit for.

If declining hiring continues to be a trend for these skills, you may find it harder to recruit potential candidates. Colleges and universities may phase out these programs and tools in their curriculum. Also, you may want to consider similar skills that can be transferred. For example, if you are no longer sourcing for Oracle procedural language, similar languages are Sybase ASE, Microsoft SQL, and IBM DB2. You may be able to find candidates with these skills that can quickly pick up on your needed language.

What skills do you commonly hire for? Are they growing or declining in demand? Find out with a free trial of WANTED Analytics.

35 Open Source Tools for the Internet of Things

By Cynthia Harvey
August 21, 2014


If you’ve been involved with IT in any capacity in recent years, you’ve probably heard the term “Internet of Things,” or IoT. According to Gartner, IoT is at the top of the hype cycle, meaning a lot of people are excited about it, but not much real development is happening yet. While less than a billion devices were connected to the Internet in 2009, Gartner predicts that there will be 26 billion IoT devices installed in 2020, generating $300 billion in revenue for manufacturers and service providers and making a $1.9 trillion impact on the global economy.

In a nutshell, IoT is about using smart devices to collect data that is transmitted via the Internet to other devices. It’s closely related to machine-to-machine (M2M) technology. While the concept had been around for some time, the term “Internet of Things” was first used in 1999 by Kevin Ashton, who was a Procter & Gamble employee at the time.

Since then, the idea has spread rapidly and widely. A survey conducted by ARM found that more than 75 percent of enterprises are either already using IoT in some capacity or exploring ways to do so. And 96 percent of those surveyed expected to be using IoT by 2016.

Part of the reason for the great interest in IoT is the potential it offers. In a 2006 article Ashton explained, “If we had computers that knew everything there was to know about things—using data they gathered without any help from us—we would be able to track and count everything, and greatly reduce waste, loss and cost. We would know when things needed replacing, repairing or recalling, and whether they were fresh or past their best.” He concluded, “The Internet of Things has the potential to change the world, just as the Internet did. Maybe even more so.”

Much of the early work on IoT technology and standards has taken place within the open source community. This month we’re featuring some of the more interesting open source IoT projects currently in active development. While our open source lists generally focus on software, this list also features an array of open source hardware, many of which are available for hobbyists to purchase at low prices.

As always, if you know of projects that you think should be on our list, feel free to note them in the comments section below.

Development Tools

1. Arduino

Arduino is both a hardware specification for interactive electronics and a set of software that includes an IDE and the Arduino programming language. The website explains that Arduino is “a tool for making computers than can sense and control more of the physical world than your desktop computer.” The organization behind it offers a variety of boards, starter kits, robots and related products for sale, and many other groups have used Arduino to build IoT-related hardware and software products of their own.

2. Eclipse IoT Project

Eclipse is sponsoring several different projects surrounding IoT. They include application frameworks and services; open source implementations of IoT protocols, including MQTT CoAP, OMA-DM and OMA LWM2M; and tools for working with Lua, which Eclipse is promoting as an ideal IoT programming language. Eclipse-related projects include Mihini, Koneki and Paho. The website also includes sandbox environments for experimenting with the tools and a live demo.

3. Kinoma

Owned by Marvell, the Kinoma software platform encompasses three different open source projects. Kimona Create is a DIY construction kit for prototyping electronic devices. Kimona Studio is the development environment that works with Create and the Kinoma Platform Runtime. Kimona Connect is a free iOS and Android app that links smartphones and tables with IoT devices.

4. M2MLabs Mainspring

Designed for building remote monitoring, fleet management and smart grid applications, Mainspring is an open source framework for developing M2M applications. It capabilities include flexible modeling of devices, device configuration, communication between devices and applications, validation and normalization of data, long-term data storage, and data retrieval functions. It’s based on Java and the Apache Cassandra NoSQL database.

5. Node-RED

Built on Node.js, Node-RED describes itself as “a visual tool for wiring the Internet of Things.” It allows developers to connect devices, services and APIs together using a browser-based flow editor. It can run on Raspberry Pi, and more than 60,000 modules are available to extend its capabilities.


6. Arduino Yún

This microcontroller combines the ease of an Arduino-based board with Linux. It includes two processors—the ATmega32u4 (which supports Arduino) and the Atheros AR9331 (which runs Linux). Other features include Wi-Fi, Ethenet support, a USB port, micro-SD card slot, three reset buttons and more. They are available for purchase from the Arduino website.

7. BeagleBoard

BeagleBoard offers credit-card sized computers that can run Android and Linux. Because they have very low power requirements, they’re a good option for IoT devices. Both the hardware designs and the software they run are open source, and BeagleBoard hardware (often sold under the name BeagleBone) is available through a wide variety of distributors.

8. Flutter

Flutter’s claim to fame is its long range. This Arduino-based board has a wireless transmitter that can reach more than a half mile. Plus, you don’t need a router; flutter boards can communicate with each other directly. It includes 256-bit AES encryption, and it’s easy to use. Both the hardware and the software are completely open source, and the price for a basic board is just $20.

9. Local Motors Connected Car

Local Motors is a car company that manufactures open source car designs on a small scale. They collaborated with IBM on an IoT-connected vehicle that they showed off at a conference last spring. Much of the open source software and design specifications for the prototype are available for download from the link above.

10. Microduino

As you might guess from its name, Microduino offers really small boards that are compatible with Arduino. In fact, these boards are about the size of a quarter and can be stacked together to create new things. All the hardware designs are open source, and core modules start at just $8 each. It was funded by a Kickstarter campaign that raised $134,563.

11. OpenPicus

This company offers a line of programmable modules and kits for connecting devices to the cloud and the Internet of Things. Its platform and hardware are open source, but its products can be used to create closed source commercial products. The company also offers its development services for hire.

12. Pinoccio

Arduino-compatible Pinnoccio boards (which the company calls “Scouts”) connect to each other in a low-power mesh network. They include a built-in rechargeable battery that can connect to solar panels or any USB power supply. The organization also offers Pinoccio HQ, a GUI for monitoring the activities of the scouts, and ScoutScript, an easy-to-use scripting language for controlling the devices. A starter kit costs $197.

13. RasWIK

Made by a company called Ciseco, RasWIK is short for the Raspberry Pi Wireless Inventors Kit. It allows anyone with a Raspberry Pi to experiment with building their own Wi-Fi-connected devices. It includes documentation for 29 different projects or you can come up with one of your own. There is a fee for the devices, but all of the included code is open source, and you can use it to build commercial products if you choose.


Short for “Solar-Powered Data Acquistion,” SADAQ offers Arduino-compatible boards with Lego-like plug-in modules. The website includes a number of tutorials, making it a suitable for beginners. And the solar panel makes it a good choice for logging environmental data in various locations where power and Internet connections might not be available. A basic board starts at $39.

15. Tessel

Tessel aims to make hardware development easier for software developers with this JavaScript-enabled microcontroller that plugs into any USB port. You can also connect it to additional modules to add accelerometer, ambient light and sound, camera, Bluetooth, GPS and/or nine other capabilities. One board and a module starts at $99 with additional modules available for $25. All the software and hardware designs are fully open source.

16. UDOO

This Arduino-compatible board can also run Android or Linux (a distribution called UDOObuntu) from its second processor. It boasts that it is four times as powerful as a Raspberry Pi. Multiple tutorials and projects are available on the website, and it also offers a “Made by UDOOers” section where people can show off their creations. Prices start at $99 for a basic board.

Home Automation Software

17. OpenHAB

OpenHAB lets the smart devices you already have in your home talk to one another. It’s vendor- and hardware-neutral, running on any Java-enabled system. One of its goals is to allow users to add new features to their devices and combine them in new ways. It’s won several awards, and it has a companion cloud computing service called my.openHAB.

18. The Thing System

This project includes both software components and network protocols. It promises to find all the Internet-connected things in your house and bring them together so that you can control them. It supports a long list of devices, including Nest thermostats, Samsung Smart Air Conditioners, Insteon LED Bulbs, Roku, Google Chromecast, Pebble smartwatches, Goji smart locks and much more. It’s written in Node.js and can fit on a Raspberry Pi.


19. IoTSyS

This IoT middleware provides a communication stack for smart devices. It supports multiple standards and protocols, including IPv6, oBIX, 6LoWPAN, Constrained Application Protocol and Efficient XML Interchange. Several videos on the website show how it works in action.

20. OpenIoT

The OpenIoT website explains that the project is “an open source middleware for getting information from sensor clouds, without worrying what exact sensors are used.” It aims to enable cloud-based “sensing as a service,” and has developed use cases for smart agriculture, intelligent manufacturing, urban crowdsensing, smart living and smart campuses. Its backers include Athens Information Technology (AIT), École Polytechnique Fédérale de Lausanne (EPFL), the Fraunhofer Institute for Optronics, System Technology and Image Exploitation IOSB, SENSAP Microsystems AE, AcrossLimits, the Commonwealth Scientific and Industrial Research Organisation (CSIRO), the University of Zagreb Faculty of Electrical Engineering and Computing, and the National University of Ireland, Galway.

Operating Systems

21. AllJoyn

Originally created by Qualcomm, this open source operating system for the Internet of Things is now sponsored by one of the most prominent IoT organizations—The AllSeen Alliance, whose members include the Linux Foundation, Microsoft, LG, Qualcomm, Sharp, Panasonic, Cisco, Symantec and many others. It includes a framework and a set of services that will allow manufacturers to create compatible devices. It’s cross-platform with APIs available for Android, iOS, OS X, Linux and Windows 7.

22. Contiki

Contiki describes itself as “the open source OS for the Internet of Things.” It connects low-power microcontrollers to the internet and supports standards like IPv6, 6lowpan, RPL and CoAP. Other key features include highly efficient memory allocation, full IP networking, very low power consumption, dynamic module loading and more. Supported hardware platforms include Redwire Econotags, Zolertia z1 motes, ST Microelectronics development kits and Texas Instruments chips and boards. Paid commercial support is available.

23. Raspbian

While the Raspberry Pi was intended as an educational device, many developers have begun using this credit-card-sized computer for IoT projects. The complete hardware specification is not open source, but much of the software and documentation is. Raspbian is a popular Raspberry Pi operating system that is based on the Debian distribution of Linux.

24. RIOT

RIOT bills itself as “the friendly operating system for the Internet of Things.” Forked from the FeuerWhere project, RIOT debuted in 2013. It aims to be both developer- and resource-friendly. It supports multiple architectures, including MSP430, ARM7, Cortex-M0, Cortex-M3, Cortex-M4, and standard x86 PCs.

25. Spark

Spark is a distributed, cloud-based IoT operating system. The same company also offers easy-to-use hardware development kits and related products that start at just $39 (and the hardware designs are also open source). It includes a Web-based IDE, a command-line interface, support for multiple languages, and libraries for working with many different IoT devices. It has a very active user community, and a lot of documentation and online help are available.


26. Freeboard

Freeboard aims to let users create their own dashboards for monitoring IoT deployments. The code is freely available on GitHub or you can try the service for free if you make your dashboard public. Low-priced plans are also available for those who want to keep their data private. Sample dashboards oon the site show how they can be used to track air quality, residential appliances, distillery performance or environmental conditions in a humidor.


27. Exciting Printer

Exciting offers an open source kit for experimenting with IoT printing. It makes it possible to build your own small printer and use that printer to print out information obtained from various IoT devices. For example, it could print out a list of daily reminders, the weather report, etc. And in a interesting twist, if you want to contact the project owners, you can draw a picture that will be printed on the IoT printer in their office.

Platforms and Integration Tools

28. DeviceHive

This project offers a machine-to-machine (M2M) communication framework for connecting devices to the Internet of Things. It includes easy-to-use Web-based management software for creating networks, applying security rules and monitoring devices. The website offers sample projects built with DeviceHub, and it also has a “playground” section that allows users to use DeviceHub online to see how it works.

29. Devicehub.net

Devicehub.net describes itself as “the open source backbone for the Internet of Things.” It’s a cloud-based service that stores IoT-related data, provides visualizations of that data and allows users to control IoT devices from a Web page. Developers have used the service to create apps that track health information, monitor the location of children, automate household appliances, track vehicle data, monitor the weather and more.

30. IoT Toolkit

The group behind this project is working on a variety of tools for integrating multiple IoT-related sensor networks and protocols. The primary project is a Smart Object API, but the group is also working on an HTTP-to-CoAP Semantic mappin , an application framework with embedded software agents and more. They also sponsr a meetup group in Silicon Valley for people who are interested in IoT development.

31. Mango

Mango bills itself as “the world’s most popular open source Machine-to-Machin (M2M) software.” Web-based, it supports multiple platforms. Key features include support for multiple protocols and databases, meta points, user-defined events, import/export and more.

32. Nimbits

Nimbits can store and process a specific type of data—data that has been time- or geo-stamped. A public platform as a service is available, or you can download the software and deploy it on Google App Engine, any J2EE server on Amazon EC2 or on a Raspberry Pi. It supports multiple programming languages, including Arduino, JavaScript, HTML or the Nimbits.io Java library.

33. OpenRemote

OpenRemote offers four different integration tools for home-based hobbyists, integrators, distributors, and manufacturers. It supports dozens of different existing protocols, allowing users to create nearly any kind of smart device they can imagine and control it using any device that supports Java. The platform is open source, but the company also sells a wide variety of support, ebooks and other tools to aid in the design and product development process.

34. SiteWhere

This project provides a complete platform for managing IoT devices, gathering data and integrating that data with external systems. SiteWhere releases can be downloaded or used on Amazon’s cloud. It also integrates with multiple big data tools, including MongoDB and ApacheHBase.

35. ThingSpeak

ThingSpeak can process HTTP requests and store and process data. Key features of the open data platform include an open API, real-time data collection, geolocation data, data processing and visualizations, device status messages and plugins. It can integrate multiple hardware and software platforms including Arduino, Raspberry Pi, ioBridge/RealTime.io, Electric Imp, mobile and Web applications, social networks and MATLAB data analytics. In addition to the open source version, a hosted service is also available.

Detailed Table of Analytics

DOWNLOAD  DetailedTableOfAnalytics

Identifying several analytic techniques that can be applied to your problem is useful, but their name alone will not be much help. The “Detailed Table of Analytics” translates the names into something more meaningful. Once you’ve identified a technique in the Guide to Analytc Selection, find the corresponding row in the table. There you will find a brief description of the techniques, tips we’ve learned and a few references we’ve found helpful.

The Field Guide to Data Science : IoT

What You Really Need to Know



The proliferation of IoT devices drastically increases the attack surface and creates attractive, and sometimes easy, targets for attackers. Traditional means of securing networks will no longer suffice as attack risks increase exponentially. We will help you learn how to think about security in an IoT world and new security models.


IoT will fundamentally change how organizations conduct business today. Activities that require significant human effort can become automated [like inspecting electrical meters], capabilities or skillsets may become obsolete while others grow in demand [such as data scientists and privacy officers], and employees will require new skills [such as product managers and project managers]. We will highlight some of the key impacts to your workforce to plan for success up front with your IoT deployment.


IoT implementations typically contain hundreds of sensors embedded in different “things”, connected to gateways and the Cloud, with data flowing back and forth via a communication protocol. If each node within the system “speaks” the same language, then the implementation functions seamlessly. When these nodes don’t talk with each other, however, you’re left with an Internet of one or some things, rather than an Internet of everything.