Spicule - Data Processing Experts

Storage

Big Data Storage Options

Keeping Your Data Safe and Secure and Accessible

Anssr Analytics can utilise a number of different storage solutions. The main two solutions are Amazon S3 (and compatible stores) and HDFS provided by the Apache Hadoop project.

    459145 amazon s3 logoAmazon S3 (Simple Storage Service) is an object storage system capable of storing data files up to 5 terabytes in size.

    It’s durable, scalable and versatile and can be very cost efficient. It also includes a useful lifecycle policy that lets you automatically migrate your older data to lower cost storage and version control that easily allows you to restore data which has accidentally been overwritten or deleted.

    If S3 is the big data storage option you choose, we can provide the technical support and expertise to ensure Druid and S3 work together as seamlessly as possible.

    HdfsHDFS is an open source, distributed file system that’s specifically designed for the processing of large data sets, stewarded by the Apache Hadoop community.

    Hadoop's services include data access, data governance, security and operations and it can store, manage and analyse vast amounts of structured and unstructured data quickly and reliably at extremely low-cost.

    Hadoop’s many benefits include scalability and performance (data can be stored, processed and analysed at petabyte scale), resilience (if a node fails, processing is immediately re-directed to the remaining nodes in the cluster) and flexibility (data can be stored in any format).

    SpiculeAt Spicule, we operate Druid using Hadoop via the innovative ANSSR platform we developed in conjunction with Canonical.

    For our purposes, Hadoop is especially valuable when interrogating huge amounts of data both in real time and historically.

    Because Druid supports both streaming and batch ingestion, and combines seamlessly with Hadoop’s distributed filesystem, the Druid platform powered by Hadoop takes the headache out of running interactive analytics at scale and ensures we receive the best query latencies possible.

    Hadoop Storage: 6 Key Features

    • Cost Effectiveness

      Hadoop is open source and doesn’t require any expensive or specialist hardware to implement.

    • Larger Node Clusters

      A Hadoop cluster can consist of millions of nodes, providing huge storage system capability and massive computing power.

    • Distributed Data

      Hadoop distributes and splits the data across all the nodes within a cluster. It also replicates the data over the entire cluster. If any of the nodes unexpectedly fail, your data won’t be lost and your analysis will continue uninterrupted.

    • Supercharge Your Workflows

      Deploying on Hadoop gives you scope to utilise other Hadoop functionality. Transform your data before it gets ingested or run further post query analysis on your Druid results.

    • Supports Parallel Processing and Heterogeneous Clusters

      Parallel processing means data can be processed simultaneously across all nodes in the cluster (saving a lot of time) and heterogeneous clusters means each node can be from a different vendor running a different type or version of operating system.

    • Scalability

      Like Druid, Hadoop can be scaled up or down depending upon your requirements. You won’t be paying for more processing power than you need.

    Let's Start Talking

    Request a Callback