Preparing your data

The core information in BigML resources is immutable to ensure traceability and reproducibility. However, some helpful properties like name, description, category and tags can be modified. For sources and datasets contained data itself is immutable, but you may need to change some properties, like parsing formats or field types to ensure that data is correctly handled.

Updating a source

Sources describe the structure inferred by BigML from the uploaded data. This structure includes field names and types, locale, missing values, etc. You can learn more about all the available source attributes at https://bigml.com/api/sources#sr_source_properties. Some of them can be updated to ensure that your data is correctly interpreted. For instance, you could upload a source where a column contains only a subset of integers. BigML will consider this a numeric field. However, in your domain these integers could be the code associated to a category. Then, the field should be handled like a categorical field. You can change the type assigned to the field by calling:

using BigML;
using System;
using System.Threading.Tasks;
using Newtonsoft.Json.Linq;

namespace Demo
{
    /// <summary>
    /// Updates a source stored in BigML.
    ///
    /// See complete Sources documentation at
    /// https://bigml.com/api/sources
    /// </summary>
    class UpdateSource
    {
        static async void Main()
        {
            // New BigML client in production mode with username and API key
            Console.Write("user: ");
            var User = Console.ReadLine();
            Console.Write("key: ");
            var ApiKey = Console.ReadLine();
            var client = new Client(User, ApiKey);

            // change the id to your source
            string sourceId = "source/57d7240228eb3e69f3000XXX";

            dynamic data = new JObject();
            data["fields"] = new JObject();
            data["fields"]["000000"] = new JObject();
            data["fields"]["000000"].optype = OpType.Categorical.ToString().ToLower();

            // Apply changes
            client.Update<Source>(sourceId, data);
        }
    }
}

where sourceId is the variable that contains the ID of the source to be updated and we change the type of the field whose ID is 000000 to categorical. The IDs for the fields can be found in the fields attribute of the source structure, which contains the properties of each field keyed by its ID.

Updating a dataset

Datasets are the starting point for all models in BigML and contain a serialized version of your data where each field has been summarized and some basic statistics computed. According to its contents, each field gets an attribute called preferred whose value is a boolean. The value is set to true when BigML thinks that this field will be useful as predictor when creating a model. Fields like IDs, constants or with unique values are marked as preferred = false as they are not usually useful when modeling. However, this attribute can be changed if you still want to include them as possible predictors for the model.

Another commonly updated attribute in a dataset is the objective_field, that is, the field to be predicted in decision trees, ensembles or logistic regressions. By default, the last field in your dataset is used as objective field. The following example shows how to update the objective field to the first field in your dataset, whose ID is 000000, and to include or exclude several fields from further analysis by changing their preferred attribute.

using BigML;
using System;
using System.Threading.Tasks;
using Newtonsoft.Json.Linq;

namespace Demo
{
    /// <summary>
    /// Update some properties of a Dataset stored in BigML
    ///  * Change objective field
    ///  * Mark one field as preferred
    ///  * Exclude one field
    ///
    /// See complete Dataset documentation at
    /// https://bigml.com/api/datasets
    /// </summary>
    class UpdateDataset
    {
        static async void Main()
        {
            // New BigML client in production mode with username and API key
            Console.Write("user: "); var User = Console.ReadLine();
            Console.Write("key: "); var ApiKey = Console.ReadLine();
            var client = new Client(User, ApiKey);

            // Update this string with your dataset Id
            string datasetId = "dataset/57d7283b28eb3e69f1000XXX";

            dynamic data = new JObject();
            data["fields"] = new JObject();
            // Mark one field as preferred
            data["fields"]["000000"] = new JObject();
            data["fields"]["000000"].preferred = true;
            // Exclude these two fields
            data["fields"]["00003f"] = new JObject();
            data["fields"]["00003f"].preferred = false;
            data["fields"]["000041"] = new JObject();
            data["fields"]["000041"].preferred = false;
            // Update Objective field
            data["objective_field"] = new JObject();
            data["objective_field"]["id"] = "000003";

            // Apply changes
            client.Update<DataSet>(datasetId, data);
        }
    }
}