Docs
User Guides
Recommender
Files & Feature Engineering

Manage your Data and do Feature Engineering

The Files section, and the Feature Engineering section of the Workbench is where you import, manage, and transform raw data into valuable features to enhance machine learning models, enabling efficient data preparation and preprocessing.

The data used to set up your recommender will likely consist of historical information. Whether offers that have been taken up by clients, and/or characteristics of the client at the point they were presented with offers.

🪶

Examples of different styles of recommender data sets can be found in the example projects.

Add Data

In the Data and Features section of the Workbench, you will find Manage Files. Here, you will be able to add, view, delete and download the data files available for you to build predictions with.

Manage files

Upload Data

To upload a file of your own, select + Upload File. A section will open below the files list where you can input the details of your upload. Files must be uploaded in either CSV or JSON format. Upload and then refresh, the file will appear in your files list.

Upload files

Download Data

To download a file, click on the file name. A section will open up where you can view the details of your download. Click Download and select your download location.

Download files

Delete Data

To delete a file, click Delete to the right of the file name. Deleting a file from here will remove the file from all projects whether active or inactive!

Delete files

Connect a Database

In the Data and Features section of the Workbench, you will find Feature Engineering. Add a database using connection strings with the Presto Data Navigator. If you have your own database, you can connect it here. This database access option uses the Presto Worker in the platform. Add a Connection path, similar to this example: local/master?user=admin. Then write a SQL statement to extract the data you want, similar to this example:select * from master.bank_customer limit 2. Then click Execute.

Presto navigator

✏️
Using Presto in the Workbench

In order to add data using the presto functionality, you must first have your presto connection accurately set up.

Ingest Data

Ingest data to be used in your projects with the Ecosystem Data Navigator. Once data has been added to the Platform it must be ingested into a specified database and collection.

Ingest Data

Add a Database

You can either select a database and ingest your file into it, or create a new database by selecting + Add Database.

Add database

Add a unique database name related to your project. Click the Database button to the left of the input field to create it.

Create database

Once your database has been created, refresh the database list and click into it.

View database

Ingest Collections

To ingest your file as a new collection inside your chosen database, select + Ingest Collection.

Ingest collection

Select your file from the file list. You will see the file name appear above the Ingest: input field. Either copy this name or choose a unique one related to your project, then click Ingest to the left of the input.

Select collection

⚠️
Check collection name before ingesting!

If the name of the collection you are ingesting already exists, the new data will be appended to the existing data. It will not replace the existing data.

Find & Export Collections

Once data has been ingested, it must be put into a format in which it can be used by machine learning algorithms. In order to do this, export your Collection to the ecosystem platform. You can then create a feature store from the exported data. Refresh the page if your Collection has not yet appeared on the list. Find your Collection. Using the Options dropdown to the right of your collection name, click Export.

Export collection

View and edit the details of your export. Most of the settings in this tab can remain default. If you are unsure of how much data to export, leave the Number to Export as 0 to export it all. Click Export.

Export details

🪶

Now that you have consolidated your data, it’s time to put it to work!