Are you considering moving your big data workloads from on-premises to Amazon EMR? Do you already have workloads on EMR but are looking to optimize performance and costs? If you answered “yes” to either of these questions, this blog is perfect for you.
In Episode 1 of the Amazon EMR Insider Series, Developer Advocate at AWS Nicholas Walsh sat down with Roy Hasson, Sr. Analytics Specialist Manager at AWS, and Kunal Agarwal, Co-founder and CEO at Unravel Data, to share best practices and new features that enable you to optimize big data costs with Amazon EMR and Unravel. This blog is adapted from the virtual session.
What is Amazon EMR?
Amazon Elastic MapReduce (EMR) is a fully managed cluster system that allows you to launch 19 different big data frameworks, including Apache Spark, Hadoop, Hive, Presto, and HBase. The image below shows some of the features and benefits of using EMR.
So why would you switch from self-managing (on-premises or on Amazon EC2, for example) to Amazon EMR?
Self-Managing Is Tedious and Expensive
Self-managing is difficult. It’s tedious, expensive, and requires you to have engineers on call. To be more specific, we’ll explore the three biggest challenges that, in Roy’s experience, customers face when self-managing.
OVERPROVISIONED AND UNDERUTILIZED RESOURCES
When self-managing, organizations often build clusters based on peak utilization. Buying hardware and configuring your environment for that peak can be very expensive, but Roy found that on average, customers only use between 50% and 60% of their capacity. Buying resources based on sudden spikes of max utilization can be a waste of money.
FALLING BEHIND ON OPEN SOURCE RELEASES
The open source software community moves quickly, with new versions, patches, and security fixes updated on a regular basis. It can be hard to keep up with these updates, and it can be difficult to upgrade clusters when there are so many workloads running on it. Falling behind on security patches especially can be risky.
MISSED SLAS
Missed SLAs can often be attributed to resource contention or seasonality. Resource contention is when high usage of resources on one workload delays another important report. Seasonality, however, is when an organization needs to run larger processing workloads at certain times, for example, at the beginning of a month or quarter. Buying more hardware just for these high volume periods is expensive.
In the remainder of this blog, we’ll explore how the combination of Amazon EMR and Unravel can help mitigate these challenges, improve performance, and reduce cost. But first, let’s start with understanding three different approaches to migrating to EMR.
Migrating to EMR: 3 Different Ways
If your organization is considering migrating to Amazon EMR, there are three approaches you can take: Lift & Shift, Rearchitect, and Net New.
LIFT & SHIFT
Lift & Shift is exactly what it sounds like. You lift what you have on-prem and shift to EMR—, or in other words, you take what you have on-prem and duplicate it on EMR. This requires the least amount of time and effort of the three approaches and is most useful when you’re dealing with time or budget constraints but still need to move to the cloud.
REARCHITECT
When you rearchitect, you must think about the future state of architecture. Imagine where you want to be and what you’ll need in the future rather than focusing solely on the problems your organization is facing today. EMR experts can spend time with customers to understand what workloads they’re moving and how to architect these workloads correctly in order for them to work optimally in the future. Rearchitecting allows you to gain the most value from the cloud.
NET NEW
Net new is simply leaving what you have on-prem and deploying a new workload on the cloud. Over time you can add more workloads to the cloud.
So how do you pick the right migration approach?
It’s all about understanding your workloads and answering questions such as “Are we over- or underutilizing resources?” or “What part of the day do we use the most resources?” In Roy’s experience, customers may find answering these questions difficult because they often have hundreds or even thousands of workloads. This is where Unravel comes into play.
Understanding Your Workloads & Migrating with Unravel
Unravel is a performance and cost optimization solution designed around big data workloads. Unravel uses intelligence to help you migrate as fast as possible, allowing you to instantly understand how cloud—and EMR in particular—benefits you, select workloads best suited for migration, map your current environment to the right cloud architecture, and use a data-driven approach to clearly understand costs before migrating. By providing this information, Unravel can help you determine what benefits you’d gain by moving to the cloud, as well as help you determine which method of migration is best suited for your workloads.
Through a cluster discovery analysis, Unravel connects to your current environment and instantly populates everything that you need to know about your environment, such as:
- What services are running
- What hosts are running
- What technology is being used (ex: Spark, MapReduce, Presto)
- How many and which users are running the app data pipelines
- What type of jobs and how much resources these users are consuming
This information can be used to understand what benefits you’d get from moving to Amazon EMR. For example, you can see if an on-prem cluster is overallocated, if a user’s applications are always failing because of contention of resources, or if there is seasonality. These are all good reasons to move to the cloud, where you don’t have to pay for perceived capacity but rather pay for just the amount that you’re using and can use auto-scaling to get the amount of resources you need without compromising performance or multiplying costs.
If, given this information, you decide that the benefits from EMR make the move worth it, Unravel can also help you determine which workloads you should migrate, and therefore which approach to migration you should use.
Unravel can help you understand—based on usage, not capacity—which workloads you should migrate to EMR. If you want to migrate all workloads via a Lift & Shift approach, Unravel has the capability to map every host, service, and application from on-prem to the appropriate AWS environment as well as tell you what it would cost. This method, however, may not always be the best choice.
For example, if you’re not using all your capacity on-prem, it may be better to consider the Rearchitect or Net New approaches. To help determine which workloads, if any, to move from on-prem to EMR, Unravel provides fine-grained visibility to help you determine which users or types of applications are best suited for the cloud. For example, you may want to move only Spark workloads or only the marketing team to EMR.
I’m Already on EMR, but How Do I Save Money?
Once you’ve moved the appropriate workloads to EMR, you may be wondering “What cost-saving features are built into EMR?” To answer this question, we’ll touch on three features, Amazon EMR Runtime for Apache Spark, Managed Scaling, and Spot Fleets.
AMAZON EMR RUNTIME FOR APACHE SPARK
EMR Runtime for Apache Spark creates an API-compatible performance optimization layer built onto the Spark engine. It carries out benchmarking, Spark memory, and container and executed tuning for you, resulting in better performance and lower costs. In fact, using Spark on EMR with runtime results in:
- 2.6x faster performance than Spark on EMR without runtime
- 1.6x faster performance than third-party managed Spark (with their runtime)
- 1/10th the cost of third-party managed Spark (with their runtime).
MANAGED SCALING
Auto-scaling configures how you want your cluster to scale up or down based on any number of parameters, such as memory, CPU, queue depth, etc. Autoscaling, however, requires users to take the reins to determine what parameters and thresholds to use in order to scale up or down.
Managed scaling takes the burden off of the user and analyzes parameters for you. All you need to do is provide information about the cluster, such as its minimum and maximum size, and set the threshold. From there, EMR evaluates a number of different metrics in under 10 seconds to auto-scale the cluster up and down as close to the workload demand as possible. By quickly scaling a cluster up or down to the demand of the workload, managed scaling not only does the heavy lifting for you but can also save you up to 60% on cost.
SPOT FLEETS
Instance fleets for advanced Spot provisioning, also known as Spot Fleets, gives you the ability to pick and choose the right instance types you want to use in your cluster, mixing and matching instance types of Spot and on-demand. EMR also monitors the capacity available in the Spot market and can switch between Spot instances for you. This helps reduce Spot interruptions, therefore reducing both job runtime and costs.
These cost saving features are all already included in Amazon EMR. But in order to utilize them, it’s helpful to have visibility into what is happening within your workloads. Unravel can provide that.
4 Ways to Optimize Cost and Performance with Unravel
If you want to reduce costs in any area, whether it be your monthly home expenses or your workloads on Amazon EMR, a good first step is to know what you’re currently spending your money on. Then from there you can make the appropriate changes to cut costs. On EMR, Unravel helps you with both stages. Kunal shares four main ways Unravel can help you optimize costs on EMR.
Automatic Cost Savings Prompts: Unravel breaks down costs for different applications and workloads to help you understand which ones you can actually use EMR’s cost optimization features for. Unravel is constantly hunting for and prompting you for ways you can improve performance and save money.
Right-Size Instance Recommendations: Unravel helps you determine the right number of instances for running your app or the entire workload on a specific cluster.
Tune Applications for Best Cost and Performance: Some areas where you can reduce costs are hidden in the application itself. For example, there may just be bad code. Unravel helps you tune your applications and adjust configuration settings, which can drastically improve performance and reduce costs.
Data-Tiering Recommendations: If you want to take advantage of data tiering, how do you determine which tables are being used and which ones are not in order to tier them appropriately? Well, Unravel can recommend tiering for you.
Conclusion
Whether you are running your Apache Spark, Hive, or Presto workloads on-premises or on AWS, Amazon EMR and Unravel are sure ways to save you money. If you’re interested in learning more about the benefits of EMR and want to see a demo on how Unravel can help you optimize your Amazon EMR cluster costs, be sure to watch the virtual session. If you want to know more about Unravel, you can sign up for a free trial or contact us.