Categories

Versions

You are viewing the RapidMiner Studio documentation for version 10.0 - Check here for latest version

Equalize Time Stamps (Time Series)

Synopsis

This operators computes an equalized time series of an input time series with date time indices.

Description

The output time series will have new equidistant index values. The configuration of the new index values are defined by the parameter equalize method. Each method has different ways how the number of examples, start date, stop date and step size of the new index values are determined. For details see the description of the parameter equalize method.

Note that the time domain (see parameter time domain) is an important distinction for equalizing time stamps. Calendar entries for example are not equidistant on a time duration scale (e.g. months have different length). Nevertheless for many use cases (e.g. sales time series) it is important to have monthly 'equidistant' time stamps. In other use cases (e.g. sensor data) it is important to have equidistant time stamps on a microsecond scale.

The new values of the equalized time series attributes will be computed by using the same functionality as the Replace Missing Values (Series) operator (note that this functionality is configured to ensure finite values). The three parameters replace type numerical, replace type nominal and replace type date time defines how the new values are computed.

This operator works on all time series (numerical, nominal, date-time) which have date time indices.

Input

  • example set (Data Table)

    The ExampleSet which contains the time series data as attributes.

Output

  • equalized example set (Data Table)

    The ExampleSet contains the equalized time series.

  • original (Data Table)

    The ExampleSet that was given as input is passed through without changes.

Parameters

  • indices_attribute

    The attribute holding the indices values of the time series. It has to be date-time. The attribute name can be selected from the drop down box of the parameter if the meta data is known.

    Range:
  • sort_time_series

    If this parameter is selected, the input time series will be sorted, according to the selected indices attribute, before the time series operation is applied on. If it is not selected and the input time series is not sorted, a corresponding User Error is thrown.

    Keep in mind that the indices values still needs to be unique. If the values are non-unique a corresponding User Error is thrown.

    The data set provided at the original output port will be the sorted input time series.

    Range:
  • equalize_method

    This parameter defines the used equalize method. The configuration also depends on the parameters time_domain, round start and stop date and fit number of examples to range.

    • same range and number of examples as orginal data: The same range ('start' and 'stop date') and the same 'number of examples' as the original data is used. The calculation of the 'step size' depends on the parameter 'time domain'. It is either the exact duration (on a millisecond scale) between <start date> and <stop date> divided by (<number of examples> - 1) ('time domain' is 'time') or the period (on a number of days scale) divided by (<number of examples> - 1) ('time domain' is 'calendar'). For latter the number of examples can also be adapted to fit the range again (see parameter 'fit number of examples to range').
    • number of examples, start value and step size: The 'number of examples', the 'start date' and the 'step size' are provided. The 'number of examples' and the 'start date' can be retrieved from the original data or provided as custom values (see the parameters 'number of examples', 'custom number of examples', 'start date', 'custom date value'). The step size has to be provided by the parameter 'step size (duration)' or 'step size (period)', depending on the parameter 'time domain'. The stop date is calculated as <start date> + (<number of examples> - 1) x <step size>
    • number of examples and range(start,stop): The 'number of examples', the 'start date' and the 'stop date' are provided. The 'number of examples', the 'start date' and the 'stop date' can be retrieved from the original data or provided as custom values (see the parameters 'number of examples', 'custom number of examples', 'start date', 'custom start date','stop date', 'custom stop date'). The calculation of the 'step size' depends on the parameter 'time domain'. It is either the exact duration (on a millisecond scale) between <start date> and <stop date> divided by (<number of examples> - 1) ('time domain' is 'time') or the period (on a number of days scale) divided by (<number of examples> - 1) ('time domain' is 'calendar'). For latter the number of examples can also be adapted to fit the range again (see parameter 'fit number of examples to range').
    • range(start,stop) and step size: The 'start date', the 'stop date' and the 'step size' are provided. The 'start date' and the 'stop date' can be retrieved from the original data or provided as custom values (see the parameters 'start date', 'custom start date','stop date', 'custom stop date'). The 'step size' has to be provided by the parameter 'step size (duration)' or 'step size (period)', depending on the parameter 'time domain'. The 'number of examples' is calculated that the <start date> + (<number of examples> - 1) x <step size> is after the <stop date> and that <start date> + (<number of examples> - 2) x <step size> is before (thus the last index value is the first of the index values which is after the <stop date>).
    Range:
  • number_of_examples

    Specify how the number of examples is retrieved.

    • same as original data: Same value as the original data.
    • custom: The value is specified by the parameter 'custom number of examples'.
    Range:
  • custom_number_of_examples

    New number of examples for the equalized time series

    Range:
  • start_value

    Specify how the start date is retrieved.

    • same as original data: Same value as the original data.
    • custom: The value is specified by the parameter 'custom start date'.
    Range:
  • custom_start_date

    New start date of the index values for the equalized time series.

    Range:
  • stop_value

    Specify how the stop date is retrieved.

    • same as original data: Same value as the original data.
    • custom: The value is specified by the parameter 'custom stop date'.
    Range:
  • custom_stop_date

    New stop date of the index values for the equalized time series.

    Range:
  • date_format

    Date format used for custom start date and custom stop date parameters.

    Range:
  • time_domain

    Time domain for which the time series shall be equalized. Note that this is an important distinction for equalizing time stamps. Calendar entries for example are not equidistant on a time duration scale (e.g. months have different length). Nevertheless for many use cases (e.g. sales time series) it is important to have monthly 'equidistant' time stamps. In other use cases (e.g. sensor data) it is important to have equidistant time stamps on a microsecond scale.

    • time: Time differences and step size are handled exactly. This means they are handled as durations with microsecond precision.
    • calendar: Time differences and step size are handled as period in multiples of days, weeks, months and years.
    Range:
  • round_start_and_stop_date

    If selected start and stop date values (either retrieved from the original data or specified by the corresponding parameters) are rounded to the previous/next exact day. Truncating the time stamps from their hour, minutes, seconds part. This is done before the non-provided configuration parameters of the equalize method are computed.

    Range:
  • fit_number_of_examples_to_range

    This parameter is only enabled for time domain = calendar and for the equalized methods: same range and number of examples as original data and number of examples and range(start,stop).

    If selected the number of examples is fitted to the provided range after the range is determined. This is needed, due to the fact that the step size is a multiple of one day and therefore the actual stop date can be way after the provided one.

    Range:
  • step_size_(time_duration)

    Step size (as a duration with microsecond precision) between the new index values of the equalized time series. Used in case parameter time domain is time.

    Range:
  • step_size_(time_period)

    Step size (as a period in multiple of days, weeks, months or years) between the new index values of the equalized time series. Used in case parameter time domain is calendar.

    Range:
  • replace_type_numerical

    The kind of replacement which is used to compute the new numerical values of the equalized time series.

    • previous value: The previous value in the series is used as a replacement. Neighboring missing values are all replaced by the first previous valid value. Missing values at the start of a series are replaced by the next valid value.
    • next value: The next value in the series is used as a replacement. Neighboring missing values are all replaced by the next valid value. Missing values at the end of a series are replaced by the first previous valid value.
    • average: The average of the neighboring values in the series is used as a replacement. Neighboring missing values are all replaced by the average of the neighboring valid values. Missing values at the start and end of a series are replaced by the next, respectively previous valid value.
    • linear interpolation: A linear interpolation (using the old and new index values) between the two neighboring values in the series is used to calculate the replacement value. The next valid neighboring values are used to perform a linear interpolation and all missing values are replaced by the replacement values calculated by the linear interpolation. Missing values at the start and end of a series are replaced by the next, respectively previous valid value.
    • value: All missing values are replaced by a constant value, specified by the replace value numerical parameter.
    Range:
  • replace_type_nominal

    The kind of replacement which is used to compute the new nominal values of the equalized time series.

    • previous value: The previous value in the series is used as a replacement. Neighboring missing values are all replaced by the first previous valid value. Missing values at the start of a series are replaced by the next valid value.
    • next value: The next value in the series is used as a replacement. Neighboring missing values are all replaced by the next valid value. Missing values at the end of a series are replaced by the first previous valid value.
    • value: All missing values are replaced by a constant value, specified by the replace value nominal parameter.
    Range:
  • replace_type_date_time

    The kind of replacement which is used to compute the new date time values of the equalized time series.

    • previous value: The previous value in the series is used as a replacement. Neighboring missing values are all replaced by the first previous valid value. Missing values at the start of a series are replaced by the next valid value.
    • next value: The next value in the series is used as a replacement. Neighboring missing values are all replaced by the next valid value. Missing values at the end of a series are replaced by the first previous valid value.
    • average: The average of the neighboring values in the series is used as a replacement. Neighboring missing values are all replaced by the average of the neighboring valid values. Missing values at the start and end of a series are replaced by the next, respectively previous valid value.
    • linear interpolation: A linear interpolation (using the old and new index values) between the two neighboring values in the series is used to calculate the replacement value. The next valid neighboring values are used to perform a linear interpolation and all missing values are replaced by the replacement values calculated by the linear interpolation. Missing values at the start and end of a series are replaced by the next, respectively previous valid value.
    • value: All missing values are replaced by a constant value, specified by the replace value date time parameter.
    Range:
  • replace_value_numerical

    If replace type numerical is set to value this parameter specifies the replacement value for all missing values of numerical time series.

    Range:
  • replace_value_nominal

    If replace type nominal is set to value this parameter specifies the replacement value for all missing values of nominal time series.

    Range:
  • replace_value_date_time

    If replace type date time is set to value this parameter specifies the replacement value for all missing values of time series with date time values.

    Range:

Tutorial Processes

Equalize Artificial Demo Price Data

In this tutorial we demonstrate the usage of the Equalize Time Stamps operator, by equalizing an artificial demo data set of the 'Price' and the 'most sold color' of a fictitious product.

First a demo data set is created which simulates the price and the most sold color of a fictitious product. Both attributes are recorded at random times on different days. Then the Equalize Time Stamps operator is used to created weekly time stamps for this data set. The equalized values for the Price attribute are interpolated, while for the values for the most sold color the previously known value is used.

Resample Gas Station Data Set to 90 minutes steps

In this tutorial we demonstrate the usage of the Equalize Time Stamps operator, to resample the gas station data set to time stamps with a step size of 90 minutes.

The gas station data set from the Samples folder already has equidistant time stamps. By using the Equalize Time Stamps operator we can resample this data set to 90 minutes time steps.

Fill Weekend Gaps in a Sales Data Set

In this tutorial we demonstrate the usage of the Equalize Time Stamps operator, to fill the gaps of weekends in a demo Sales Data Set which only contains week days.