Our Top 10 Open Source Data Tools

Open Source Logo

At Spicule we truly believe in the power of open source software. The advantages to businesses are huge. Using open source will help keep costs to a minimum and being able to offer increased security and flexibility the reasons why we recommend it are plain to see.

Open Source Tools

Below is a list of our top 10 favourite open source data tools that we currently use. Get in touch as we would love to hear what you think and if there are any of your favourites that have not made our list.

Saiku Of course at number 1 we are biased, because we write it! But there is also good reason, Netflix, Samsung and Amazon all use it because its quick and effective at discovering data. Open sourced and commercially supported, Saiku is a go to tool for interactive analysis.
Pentaho Data Integration We may find the BI server cumbersome except in specific use cases, but its data integration server is top class. Extract, transform and load data with ease with PDI and its vast array of plugins.
MonetDB Open source column store databases are hard to come by, but MonetDB is the open source stalwart in this sector.  If you find your performance lagging because of the number of records in your database, “big data” probably isn’t the answer, column stores might be though.
Elasticsearch Search indexes are growing in use, fast. Elasticsearch and Solr are both dear to our hearts, but we can only include one on this list so we went for Elasticsearch, mostly because of the Kibana UI you can place seamlessly over the top.
OODT Ever want to use some NASA technology? Now’s your chance. OODT is a data toolkit for archiving data. It has a filemanager, workflow engine and more.
PostgreSQL For a more traditional database, PostgreSQL is our favourite. Full featured and open source, PostgreSQL has been around so long that basically every tool ever supports it.
Hadoop Hadoop here is a bit of a “catch all” phrase, but we love big data technology, so be it Hadoop or any of the big data offspring, the majority are open source and shepherded by the Apache Software Foundation to build ecosystems around them.
Spark Process big data at speed, Spark is your friend. Against Hadoop or just processing flat files, Spark lets you gain insight in your unstructured data.
D3 D3 allows designers to create custom visualisations around data. Plot any data on a web page utilising the flexibility of  a scriptable visualisation engine.
Kafka Stream your data around your system. Kafka will let you process data in real time and pump it to an array of services without major upheaval to your infrastructure.