3 Eylül 2016 Cumartesi

Apache Hive

Giriş
Hive HDF veya AWS S3 ile çalışabilir. Açıklaması şöyle
Apache Hive that works as a data warehouse system to query and analyze large datasets stored in the HDFS (Hadoop Distributed File System) or Amazon S3
Hive Dizin Kullanır
Hive HDF veya AWS S3 ile çalışabilir. Açıklaması şöyle
Hive is a simple directory-based design where actual data files are getting stored at the folder/directory level in HDFS. ...Hive keeps track of data at the folder level not in actual data files. 

Because of the directory-based model in Hive, listings are much slower, renames are not atomic, and results are eventually consistent. To work with data in a table, Hive needs to perform file list operations and this causes a performance bottleneck while executing SQL queries. 
Partitioning
Açıklaması şöyle
Hive partitioning can be done by dividing a table into related groups based on the values of a particular column like date, city, country, etc. Partitioning reduces the query response time in Apache Hive as data is stored in horizontal slices. In Hive partitioning, partitions are explicit and appear as a column and must be given partition values. 

Hive Kabuğu
Kabuk çalışırken > prompt karakterini görürüz

select
Örnek
Şöyle yaparız
hive> Select * from OrderData;
Örnek
Şöyle yaparız.
hive (default)> select * from tbl;
OK
1   2   3
1   3   4
2   3   4
5   6   7
8   9   0
1   8   3
Time taken: 0.101 seconds, Fetched: 6 row(s)
Hive Komutu
-e seçeneği
SQL çalıştırır. Şöyle yaparız.
1) hive -e "select ... from table where col1 between a and b"

AND.

2) hive -e "select ... from table where col >= a and col1 <= b"

Hiç yorum yok:

Yorum Gönder