In today's data-driven world, businesses rely heavily on effective Business Intelligence (BI) tools to make informed decisions. Amazon Web Services (AWS) offers two powerful services, Amazon Redshift Serverless and Amazon QuickSight, which provide scalable and efficient solutions for data warehousing and visualization. We have been using Quicksight and Redshift daily for a year now, and I’d like to share some of the great things, and not so great things, we have encountered.
Amazon Redshift Serverless
Pros:
1. Scalability: Amazon Redshift Serverless automatically scales compute and storage resources based on workload demands, ensuring optimal performance without the need for manual intervention. You don’t even have to put any indexes on your tables, or foreign keys. Redshift is great for analytics queries like sums, counts, averages, grouped by multiple dimensions. I will address some other issues in the Cons section below.
2. Cost-Efficiency: With serverless architecture, users only pay for the resources consumed, making it a cost-effective option for organizations with fluctuating workloads. There is a caveat about this in the “Cons” section below.
3. Simplified Management: Serverless eliminates the need for infrastructure management tasks such as provisioning, scaling, and maintenance, allowing users to focus on analytics and insights generation.
4. Integration with Other AWS Services: Redshift Serverless seamlessly integrates with other AWS services like Amazon S3, Glue, and Lambda, enabling smooth data ingestion, transformation, and analysis workflows.
5. High Availability: AWS manages the underlying infrastructure, ensuring high availability and fault tolerance, thereby minimizing downtime and ensuring business continuity.
Cons:
1. Cold Start Latency: Serverless architectures may experience cold start latency, where the system takes time to initialize resources, leading to slight delays in query execution for infrequently accessed data. If you have ETL processes that run in off hours, you will need to issue some basic queries to redshift to “wake it up” for your batch processing.
2. Limited Customization and old code base: Compared to traditional Redshift clusters, serverless options may offer limited customization options for hardware configurations and cluster settings. Redshift is also based on PosgreSQL V8.0. This is circa 2005!. While Amazon has done a lot of patching of this foundation over the years, there are still a lot of “gotchas” that you may run into when using Redshift. One of the issues we have run into is simply truncating tables. In some cases a simple truncate table command will run for hours and lock the entire database due to excessive RPU usage (Redshift Processing Units). If you have less than 1TB of data, a serverless RDS instance of a more modern PostgreSQL, MariaDB, or MySQL would probably be a better option for your project.
3. Complex query limitations: As mentioned above, Redshift is a juggernaut when running aggregate queries against massive amounts of data (even petabytes). However, if you introduce complex case queries, unions, substring logic, and other advanced SQL, the performance will not keep pace as Quicksight tries to query the data. We have solved this problem by using summary tables that are refreshed on a regular basis. We still keep our complex query in a view within Redshift, but we write the result set to a summary table at whatever interval is needed. This allows us sub-second response times for Quicksight. It’s not “real-time”, but it’s pretty close!
4. Cost Monitoring: While serverless can be cost-effective, organizations need to monitor usage closely to avoid unexpected spikes in costs, especially with fluctuating workloads. In the previous bullet point we mentioned excessive RPU’s. The table locking that takes place spikes RPU’s, and can end up costing you hundreds of dollars just because of a locked table and high RPU usage!
Amazon QuickSight
Pros:
1. Ease of Use: QuickSight offers a user-friendly interface with intuitive drag-and-drop functionality, making it accessible to users with varying levels of technical expertise.
2. Scalability: As a fully managed service, QuickSight scales seamlessly to accommodate growing data volumes and user bases, ensuring consistent performance.
3. Integration with AWS Ecosystem: QuickSight integrates seamlessly with various AWS data sources, including Redshift, S3, RDS, and Athena, simplifying data access and analysis.
4. Built-in ML Insights: QuickSight offers built-in machine learning capabilities for automated insights generation, anomaly detection, and forecasting, empowering users to derive actionable insights from their data.
5. Pay-per-Session Pricing: QuickSight's pay-per-session pricing model allows organizations to control costs effectively, as they only pay for active user sessions rather than per-user licenses.
Cons:
1. Limited Customization: While QuickSight offers a range of visualization options, users may encounter limitations in customizing visualizations or creating complex dashboard layouts.
2. Data Refresh Limitations: QuickSight imposes restrictions on data refresh frequencies, which may not meet the real-time analytics requirements of certain use cases. This limitation is limited to the use of Spice data sources in Quicksight. If you have live queries against your data warehouse, and properly modeled tables, this is a non-issue.
3. CI/CD: Integrating your Quicksight data sources and dashboards into an industry standard CI/CD process is way more complex than it should be. Committing your code to any code repository, and promoting code into and out of Quicksight is frustrating. You will have to become very familiar with the Quicksight CLI to retrieve the JSON formatted definitions of data sources and dashboards. In addition, altering these documents to move between environments takes some pretty serious technical chops and tools to automate these processes. We have used Apache hop http://hop.apache.org to automate some of this work, but it is not an easy or straightforward task.
Conclusion:
Both Amazon Redshift Serverless and Amazon QuickSight offer compelling solutions for BI and analytics needs within the AWS ecosystem. Redshift Serverless provides scalable, cost-efficient data warehousing, while QuickSight offers intuitive visualization and analytics capabilities. Ultimately, the choice between the two depends on factors such as workload characteristics, budget considerations, and specific use case requirements. By carefully evaluating the pros and cons outlined in this post, organizations can make informed decisions to leverage these AWS services effectively for their BI initiatives.
KPI Forge has years of experience with this technology stack, as well as all other cloud and on-prem solutions. Let’s talk about your analytic needs today!
Commentaires