Docs
Open Source
Overview

Open Source Technologies

Overview

We use a number of key open source technologies to build our platform. These technologies are essential to our success and we are grateful to the open source community for their contributions.

Dashboard Technologies

Superset

Apache Superset is a modern, enterprise-ready business intelligence web application. It is a data exploration and visualization platform designed to be visual, intuitive, and interactive. Superset allows you to create and share dashboards and reports, and it supports a wide range of data sources.

Grafana

Grafana is an open source analytics and monitoring platform. It allows you to query, visualize, alert on, and understand your metrics no matter where they are stored. Grafana provides a powerful and flexible platform for creating dashboards and visualizing data.

Database Technologies

Postgres

PostgreSQL is a powerful, open source object-relational database system. It is highly extensible and supports a wide range of data types and features. PostgreSQL is known for its reliability, robustness, and performance, and it is widely used in production environments.

MongoDB

MongoDB is a popular open source NoSQL database. It is designed for high performance, scalability, and availability, and it is widely used for building modern applications. MongoDB is known for its flexibility, ease of use, and rich query language.

Neo4j

Neo4j is an open source graph database. It is designed for storing and querying graph data, and it provides a powerful and flexible platform for building graph-based applications. Neo4j is widely used for social networks, recommendation engines, and network analysis.

Presto

Presto is an open source distributed SQL query engine for running interactive analytic queries against data sources of all sizes. It is designed for high performance and scalability, and it supports a wide range of data sources and formats. Presto is widely used in production environments for ad hoc analysis and reporting.

Databricks

Databricks is a unified analytics platform that provides a collaborative environment for data science and machine learning. It is built on top of Apache Spark and provides a powerful and flexible platform for processing and analyzing large datasets. Databricks is widely used for building real-time data pipelines and machine learning applications. The Databricks Community Edition is hosted on Amazon Web Services. However, you do not incur AWS costs when you use the Databricks Community Edition.

Streaming Technologies

Apache Kafka

Apache Kafka is a distributed event streaming platform. It is designed for high throughput, fault tolerance, and scalability, and it is widely used for building real-time data pipelines and streaming applications. Kafka provides a powerful and flexible platform for processing and analyzing streaming data.

AI Platform Technologies

PyTorch

PyTorch is an open source machine learning framework. It is designed for flexibility and ease of use, and it supports a wide range of deep learning models and algorithms. PyTorch is widely used for research and production applications, and it is known for its performance, scalability, and extensibility.

Tensorflow

TensorFlow is an open source machine learning platform. It is designed for flexibility and scalability, and it supports a wide range of machine learning models and algorithms. TensorFlow is widely used for research and production applications, and it is known for its performance, reliability, and ease of use.

H2O

H2O is an open source machine learning platform. It is designed for scalability and ease of use, and it supports a wide range of machine learning models and algorithms. H2O is widely used for research and production applications, and it is known for its performance, reliability, and extensibility.

Model Pipelines

MLflow

MLflow is an open source platform for managing the end-to-end machine learning lifecycle. It allows you to track experiments, package code, and deploy models in a variety of environments. MLflow provides a powerful and flexible platform for managing machine learning projects and workflows.

Kubeflow

Kubeflow is an open source machine learning platform. It is designed for running machine learning workflows on Kubernetes, and it supports a wide range of machine learning models and algorithms. Kubeflow provides a powerful and flexible platform for building, deploying, and managing machine learning applications.

Model Repositories

Hugging Face

Hugging Face is an open source platform for sharing and deploying natural language processing models. It provides a wide range of pre-trained models and tools for building and deploying machine learning applications. Hugging Face is widely used for research and production applications, and it is known for its performance, reliability, and ease of use.

Model Zoo

Model Zoo is an open source platform for sharing and deploying machine learning models. It provides a wide range of pre-trained models and tools for building and deploying machine learning applications. Model Zoo is widely used for research and production applications, and it is known for its performance, reliability, and ease of use.

Development Tools

IntelliJ IDEA

IntelliJ IDEA is an integrated development environment for building Java, Kotlin, and Groovy applications. It provides a powerful and flexible platform for developing and debugging code, and it supports a wide range of development tools and frameworks. IntelliJ IDEA is widely used by developers for building and deploying applications.

Visual Studio Code

Visual Studio Code is a lightweight and powerful code editor. It provides a wide range of features for developing and debugging code, and it supports a wide range of programming languages and frameworks. Visual Studio Code is widely used by developers for building and deploying applications.

Git

Git is an open source distributed version control system. It provides a powerful and flexible platform for managing code and collaborating with other developers. Git is widely used by developers for tracking changes, resolving conflicts, and deploying code.

Docker

Docker is an open source platform for building, shipping, and running applications in containers. It provides a powerful and flexible platform for packaging code and dependencies, and deploying applications in a consistent and reliable manner. Docker is widely used by developers for building and deploying applications.

Podman

Podman is an open source container management tool. It provides a powerful and flexible platform for managing containers and images, and it supports a wide range of container runtimes and storage backends. Podman is widely used by developers for building and deploying containerized applications.

Jupyter Notebooks

Jupyter Notebooks is an open source web application for creating and sharing documents that contain live code, equations, visualizations, and narrative text. It provides a powerful and flexible platform for interactive computing and data analysis. Jupyter Notebooks is widely used by data scientists and researchers for exploring data, building models, and sharing results.