2 Ocak 2024 Salı

AWS Redshift - OLAP Datawarehouse İçindir

Giriş
Açıklaması şöyle. Yani AWS Aurora'dan daha gelişmiş.
Like Amazon Aurora, Amazon Redshift is used by large enterprises. However, Redshift is more complex, can handle more data, and is referred to as a data warehouse. This is because Redshift is built for OLAP (Online Analytical Processing).

Furthermore, Redshift can scale up to Petabytes of data and supports up to 60 user-defined databases per cluster. On the other hand, Aurora can only scale to terabytes and support up to 40. Besides this, the security and the maintenance of both the database services are pretty much the same.

A few use cases of Amazon Redshift are creating machine models for forecasting operations, optimizing your company's business intelligence, and increasing developer productivity.
Benefits
  1. Redshift possesses the highest scaling capabilities amongst the three options we've examined.
  2. Its performance is much faster and more durable.
  3. Amazon Redshift can also handle a more significant amount of data and analyze it within a shorter period.
Redshift vs PostgreSQL
Açıklaması şöyle. Altta PostgreSQL 8 kullanıyor.
I mean, as much as I love AWS services, setting up Redshift as our data warehouse was a mistake and Postgres would have been a much better alternative.

Let’s be honest, unless you have massive amounts of data, more than hundreds of To’s of data, all these fancy data warehouses like Redshift just aren’t worth the cost. Redshift isn’t open source, so you can’t have a complete mini-data stack on your local computer for testing purposes. Plus, Redshift, being built on top of Postgres 8, sometimes lacks the cool features that the newer releases of Postgres have.

I know Postgres is a transactional database, but I think it’s a solid first approach for a data warehouse. If you’re dealing with tables with less than 50 million rows and under 10 terabytes of data (which is the case for most startups), Postgres might outperform Redshift. And the best part is, you can have it up and running on your local computer, making it incredibly convenient for quick iterations.

Örnek
Açıklaması şöyle
Redshift supports some SQL functions and queries which would generally only be necessary with large data warehouse applications. For example, PERCENTILE_CONT computes a linear interpolation to return a percentile.
Şöyle yaparız
SELECT
    TOP 10 salesid,
    sum(pricepaid),
    percentile_cont(0.6) WITHIN GROUP (
        ORDER BY 
            salesid
    ),
    median (salesid)
FROM
    sales
GROUP BY
    salesid,
    pricepaid;

Hiç yorum yok:

Yorum Gönder