Yazılım Çorbası: Apache Hive

3 Eylül 2016 Cumartesi

Apache Hive

Giriş

Hive HDF veya AWS S3 ile çalışabilir. Açıklaması şöyle

Apache Hive that works as a data warehouse system to query and analyze large datasets stored in the HDFS (Hadoop Distributed File System) or Amazon S3

Hive Dizin Kullanır

Hive HDF veya AWS S3 ile çalışabilir. Açıklaması şöyle

Hive is a simple directory-based design where actual data files are getting stored at the folder/directory level in HDFS. ...Hive keeps track of data at the folder level not in actual data files.

Because of the directory-based model in Hive, listings are much slower, renames are not atomic, and results are eventually consistent. To work with data in a table, Hive needs to perform file list operations and this causes a performance bottleneck while executing SQL queries.

Partitioning

Açıklaması şöyle

Hive partitioning can be done by dividing a table into related groups based on the values of a particular column like date, city, country, etc. Partitioning reduces the query response time in Apache Hive as data is stored in horizontal slices. In Hive partitioning, partitions are explicit and appear as a column and must be given partition values.

Hive Kabuğu
Kabuk çalışırken > prompt karakterini görürüz

select
Örnek

Şöyle yaparız

hive> Select * from OrderData;

Örnek

Şöyle yaparız.

hive (default)> select * from tbl;
OK
1   2   3
1   3   4
2   3   4
5   6   7
8   9   0
1   8   3
Time taken: 0.101 seconds, Fetched: 6 row(s)

Hive Komutu
-e seçeneği
SQL çalıştırır. Şöyle yaparız.

1) hive -e "select ... from table where col1 between a and b"

AND.

2) hive -e "select ... from table where col >= a and col1 <= b"

Yazılım Çorbası

3 Eylül 2016 Cumartesi

Apache Hive

Hiç yorum yok:

Yorum Gönder

Blog Arşivi