13 Tools That Every Developer Needs To Know Working With Big Data
Whether you design a system for analysis Large data or just try to collect and process the data of your mobile applications, you do not have to do without high-quality analytics tools. The good news is that at the moment a lot of companies are releasing tools on the market that take into account the needs of developers and their respective skills.
Over the past year, I’ve met many start-ups, projects, and tools designed to provide programmers with advanced analysis tools. In some cases, it was implemented in the form of simple scripts, from which there were enough powerful solutions. And in others, these tools can access more accessible technologies, including in later versions, in turn, from the lion’s share of dirty work and facilitated further work. I think this is a significant trend in this area.
- In today’s world of mobile applications and much more than ever, make business on a fairly simple application. Even in large companies, developers involved in attracting resources, proving the greater profitability of their applications or finding more profitable ways to monetize it. Sometimes this even leads to the introduction of some data processing processes into the application itself.
In any case, if your work is concerned with writing code, and not with data streams, you will probably need a little help. Next, I brought 13 tools that designed to help you in this difficult matter. As often happens with such collections, I could also remove some good examples, so I invite you to an active discussion in the comments.
BitDeli, a startup launched in November, allows programmers to evaluate everything using various metrics using Python scripts. Co-founder and CEO Willie Tuulos (Ville Tuulos) said that scripts can be either simple or complex, depending on the needs, up to the self-learning. Unlike the heavyweight Hadoop, BitDeli positions itself more as an easier solution, comparable to the Ruby on Rails framework, but only for analytics.
The brainchild of the former chief developer of cloud systems at Yahoo, Todd Papaioannou, and HBase’s database engineer in Facebook, Jonathan Gray, Continuuity, is created to help all companies work at the same high level as In the firms mentioned above. The team has created a data structure that implements a new level of abstraction over complex connections to Hadoop and HBase clusters, and also includes a full set of development tools. The main goal of the project is to simplify the process of creating big data applications that work with both internal and external audiences.
Project Flurry, as a single store-application, actually brings to its creators about $ 100 million a year, because it copes well with the tasks assigned to it. The company helps developers not only make mobile applications but also analyze all the data they give out to make these applications even better. In addition, these data can form the basis of an advertising campaign, bringing together advertisers and their target audience.
Of all the development tools from Google, Google Prediction API claims the title of the coolest. If you have the right data for learning the Prediction API, then this interface will be able to recognize any number of templates and give the correct answers to your application. Among the examples, which the company leads, there are such as the engine for detecting spam, analyzing wishes and the engine that can give recommendations, and Google also gives step-by-step instructions on how to build these models.
Although Infochimps is trying to make itself an IT company (and become closer to money), the platform with the same name, however, is of real value for developers. And the top of their technologies for configuring and managing big data is the Wukong framework, designed to work with Hadoop and its data streams, using Ruby scripts.
6. Keen IO
This project won first place in our Structure 2012 Launchpad competition as the most powerful analytics tool for mobile application developers. With just one line inserted into the source code and indicating exactly what to track, programmers can monitor everything that interests them in their applications. In this case, bringing the data into an analyzable form is just a matter of creating a user-friendly visual panel.
The main activity of Kontagent is its platform for mobile, social and web applications analytics, working with Hadoop and capable of processing really huge amounts of information. Earlier this year, the company launched a product that allows users to collect information from their applications using the SQL-like Hive query language for Hadoop. Instead of tracking predefined variables, there is freedom of choice with this product.
8. Mortar Data
Mortar Data is Hadoop for developers, simple and clear. Already almost a year ago the company offered its cloud service, replacing MapReduce with a combination of Pig and Python. In November, the release of the open Mortar framework was launched to create a community for sharing data and experience on working with Hadoop. At the moment Mortar Data works on top of Amazon Web Services and supports Amazon S3 and MongoDB (located on Amazon EC2) as information resources.
Placed has done away with scripts, APIs, and other hard work and just provides its users with a ready result. In this case, this is a detailed information about where and when consumers have used the mobile application or website, up to the name of the business. This information can be very useful for attracting advertisers, as well as for creating information functionality of the application (for example, embedding voice notification to use the application while driving).
Precog at first glance may seem like an ordinary private business, but it is not so simple on closer inspection. The company offers a service called Labcoat, which is an interactive development environment for analytical models based on Quirrel’s open query language. IDE includes a textbook for the language and some complex functions, and Precog’s executive director, Jeff Carr, said that even people without technical education can easily learn this language in a matter of hours.
Although Hadoop is written in Java, this does not mean that Java developers will easily work with Hadoop. That’s why in early 2012 SpringSource announced Spring’s release for Apache Hadoop. This means that it is now possible to integrate with other Spring applications, as well as writing scripts in JVM-like languages, and besides, the process of creating applications using Hadoop and related technologies such as Hive and HBase is much simpler.
Acting in the same vein as BitDeli with Keen IO, StatsMix allows developers to collect and process a large amount of data coming from their applications, using only those languages that they already know. The service automatically tracks some indicators, but their list can be significantly expanded due to StatsMix API and standard libraries. The results of this tool are presented to the user in the form of visual panels, the kind of which he can customize, and can also share them or use several sources of information for a single view.
Initially, Hadoop was primarily for storing data and launching MapReduce tasks, but now Hadoop is a large stack of technologies that are somehow related to processed large data (not just with MapReduce).
The core components of Hadoop are:
- Hadoop Distributed File System (HDFS) is a distributed file system that allows you to store information of almost unlimited volume.
- Hadoop YARN – a framework for managing cluster resources and task management, including the MapReduce framework.
There are also a large number of projects directly related to Hadoop, but not included in the Hadoop core:
- Hive – a tool for SQL-like queries over large data (turns SQL-queries into a series of MapReduce-tasks);
- Pig is a programming language for data analysis at a high level. One line of code in this language can turn into a sequence of MapReduce-tasks;
- Hbase – a column database that implements the BigTable paradigm;
- Cassandra is a high-performance distributed key-value database;
- ZooKeeper – service for distributed configuration storage and synchronization of changes to this configuration;
- Mahout is a library and machine learning engine for large data.
Do you have any experience with any of the services provided? Share with us.