Apache Drill Installation, Storage Plugins Configuration for HBASE, HIVE, CP and DFS for beginners.

Share on Google+Share on FacebookShare on LinkedInPin on PinterestTweet about this on TwitterEmail this to someone

Introduction

Drill is a very useful query engine it provide the facility to use it for multi-purpose. Apache drill definition is “Apache Drill is a low latency distributed query engine for large-scale datasets, including structured and semi-structured/nested data. Drill is designed to scale to several thousands of nodes and query petabytes of data at interactive speeds that BI/Analytics environments require.”

Drill is also beneficial for short, interactive ad-hoc queries on large-scale data sets. Drill is capable of querying nested data in formats like JSON and Parquet and performing dynamic schema discovery. Drill does not require a centralized metadata repository in Hadoop.

Overview

Drill provides an extensible architecture at all layers, including the storage plugin, query, query optimization/execution, and client API layers. You can customize any layer for the specific needs of an organization or you can extend the layer to a broader array of use cases. Drill uses classpath scanning to find and load plugins, and to add additional storage plugins, functions, and operators with minimal configuration.

The following image represents the communication between clients, applications, and Drillbits:

Configuration for Drillbits

Drill Installation

If you are using VM (virtual Machine) than open terminal than follow the steps:

  1. Download “apache-drill-1.4.0” from Apache site.
  2. Open terminal.
  3. Drill installation
  4. Go in to Apache drill folder by this command
  5. [cloudera@quickstart Desktop]$ cd /folder name/apache-drill-1.4.0 
  6. Go to bin folder cloudera@quickstart apache-drill-1.4.0]$ cd bin
  7. Now dial start command[cloudera@quickstart bin]$ ./drillbit.sh startAfter complete these steps. You can check your drill on browser. Default port for drill is 8047.
    apache drill

    Drill provides the facility to fetch and display  data from cp, dfs, hive, mongo and hbase. We can see it one by one.Enabled Storage Plugins for Drill 

    drill plugin

    We can change its configuration file after click on  Update  Button.

    configuration11

    Configuration code for all plugins

    Configuration code for cp

    {
    "type": "file",
    "enabled": true,
    "connection": "classpath:///",
    "workspaces": null,
    "formats": {
    "csv": {
    "type": "text",
    "extensions": [
    "csv"
    ],
    "delimiter": ","
    },
    "tsv": {
    "type": "text",
    "extensions": [
    "tsv"
    ],
    "delimiter": "\t"
    },
    "json": {
    "type": "json"
    },
    "parquet": {
    "type": "parquet"
    },
    "avro": {
    "type": "avro"
    },
    "csvh": {
    "type": "text",
    "extensions": [
    "csvh"
    ],
    "extractHeader": true,
    "delimiter": ","
    }
    }
    }

    Query Example

    Configuration code for cp .query

                                                                                                                                                                        Result                                                           

    Apache Drill Result 

    Configuration code for DFS

    {
    “type”: “file”,
    “enabled”: true,
    “connection”: “file:///”,
    “workspaces”: {
    “root”: {
    “location”: “/”,
    “writable”: false,
    “defaultInputFormat”: null
    },
    “tmp”: {
    “location”: “/tmp”,
    “writable”: true,
    “defaultInputFormat”: null
    }
    },
    “formats”: {
    “psv”: {
    “type”: “text”,
    “extensions”: [
    “tbl”
    ],
    “delimiter”: “|”
    },
    “csv”: {
    “type”: “text”,
    “extensions”: [
    “csv”
    ],
    “delimiter”: “,”
    },
    “tsv”: {
    “type”: “text”,
    “extensions”: [
    “tsv”
    ],
    “delimiter”: “\t”
    },
    “parquet”: {
    “type”: “parquet”
    },
    “json”: {
    “type”: “json”
    },
    “avro”: {
    “type”: “avro”
    },
    “sequencefile”: {
    “type”: “sequencefile”,
    “extensions”: [
    “seq”
    ]
    },
    “csvh”: {
    “type”: “text”,
    “extensions”: [
    “csvh”
    ],
    “extractHeader”: true,
    “delimiter”: “,”
    }
    }
    }

    Query Example

  8. SELECT * FROM dfs.`/home/cloudera/Desktop/employee.csv`

     Enter the path of csv file
    Configuration code for HIVE

    {
    "type": "hive",
    "enabled": true,
    "configProps": {
    "hive.metastore.uris": "thrift://localhost:9083",
    "javax.jdo.option.ConnectionURL": "jdbc:hive://localhost:10000/default",
    "hive.metastore.warehouse.dir": "/user/hive/warehouse",
    "fs.default.name": "hdfs://localhost:8020/",
    "hive.metastore.sasl.enabled": "false"
    }
    }

    Query Example :
    select * from hive.employee

    Configuration code for HBASE

    {
    "type": "hbase",
    "config": {
    "hbase.zookeeper.quorum": "localhost",
    "hbase.zookeeper.property.clientPort": "2181"
    },
    "size.calculator.enabled": false,
    "enabled": true
    }

    Query Example :

    select * from hbase.employee
     

Ravi Prakash

Ravi is Hadoop Architect and PHP open source developer.He prepares and maintains all applications utilizing standard development tools and provides technically related consultation plus expertise developer. You can contact with linkedIn

Leave a Reply

Your email address will not be published. Required fields are marked *

78 + = 88