How to generate synthetic data, and why is this important? Firstly, the Awesense Energy Transition Platform allows energy companies to easily address hundreds of utility use cases with significantly less time and resources spent on developing the use case. To allow Awesense clients and partners to experience the interaction with the Awesense Energy Transition Platform, Awesense has a dedicated sandbox data environment. The sandbox dataset consists of synthetic but realistic data that can be accessed via the APIs.
This synthetic dataset is important because actual utility data has privacy strings attached and cannot be shared with third parties outside of a contractual agreement. Getting a utility project rolling can take time, and this sandbox environment allows developers to hit the ground designing. Once the actual data becomes available, the use case can be truly perfected.
Building such a realistically-looking and logical dataset is not a trivial matter! Let’s dive deeper into how to generate synthetic data.
An electrical distribution grid dataset has to start with the grid itself:
- the elements (e.g. meters, transformers, lines, switchable elements, poles, DERs, etc.) and their attributes, whether generic (e.g. IDs, circuits) or type-specific (e.g. transformer rated power, meter consumer type, lines length, etc.),
- their location in the real world (geo-coordinates),
- the connectivity (e.g. which meter is connected to which transformer via which line, etc.)
Awesense has generated synthetic data to emulate two circuits (feeders): one modelled on North-American standards and one on European. We’ve chosen to locate it in Vancouver, where the Awesense HQs are. We’ve populated most element attributes with realistic and diverse values to represent what one might encounter in a larger grid. In the coming months, Awesense will address filling up 100% of the attributes and expanding the geospatial grid to a larger area with hundreds of feeders and thousands of consumers.
After the GIS data is created, the time series data of electrical measurements are created and attached to the corresponding elements. This can be done to various degrees of realism and ensuing complexity, and in this blog, we describe the basics rather than the advanced methods.
Let’s talk about generating synthetic data for meters first. Meters measure the net electricity flow between a consumer and the grid. For some consumers, as the name suggests, that flow is only energy the consumer has consumed (aka consumption). For others, the net flow may incorporate energy generated by some DERs on site (such as PV or batteries or EV Chargers with V2G capabilities). The first thing to tackle is consumption in the absence of DERs, which in grids of the past was the same as the net for all consumers in the grid. We have used a public dataset of such actual pre-DER consumption data to infer so-called “standard profiles” on a per-consumer-type basis, that is, distributions of values for each hour of the day for each day of a year for residential, commercial and industrial consumer types. We then sample the distribution for each meter of a given type to generate synthetic hourly consumption time series over multiple years.
Let’s now talk about DERs.
The most common type of both utility-owned and consumer-owned generation is solar photovoltaics (PV). PV generation tends to be simple in some sense because it’s directly correlated with the size of the installation and the amount of sun at the given location, which has known patterns over the days of the year and the hours of the day, so we can once again sample from distributions representing these patterns. Awesense analyzed gigabytes of public PV time series data to validate those distributions and produce realistic PV time series. We think the results pay off, and users of the Awesense sandbox can now develop PV-focused use cases with realistic results.
EV Chargers are becoming increasingly prevalent, and various entities can own them. Some belong to individual consumers who own an EV, others may be in commercial parking lots belonging to various businesses (retail, restaurants, etc.). Yet others belong to companies that own large fleets of vehicles of various types (e.g. buses, trucks, vans, etc.). Each has different usage patterns in the real world, and there’s little publicly available data to mimic.
Instead, we build time series by simulating various behavioural patterns associated with each location. For example, a commuter’s charger may draw energy in the evenings or at night, whereas the charger of someone working from home may draw energy during the day; the mall EV Chargers may have time limits for charging session length; and so on.
Additionally, the simulated time series must consider the charger’s power draw limits and those of the one or more EVs that might get connected to it and any ramp-up/ramp-down in charging that may occur at the beginning/end of sessions.
Batteries are the most complicated, as they depend on actual consumption, generation, and any smart algorithms running on the battery trying to optimize when to charge and when to discharge. Perhaps we’ll write a separate blog post with details just for them.
Also, in the future, some EV Chargers may even be able to push energy to the grid from the batteries in the cars (V2G), so synthetic time series for EV Chargers will need to be made similarly sophisticated to emulate that.
After generating the DER time series, we are ready to create time series for the meters’ net flow. For those that do not have DERs behind them, this is the same as consumption. For those with DERs, the net needs to be computed by combining the regular consumption time series with the time series of PV, EV Chargers and Batteries by adding or subtracting as appropriate.
We plan to mimic both situations where there is visibility into the energy flow of the DERs behind the meter (in which case the DER time series can be accessed separately from the meter’s net flow) and situations where that’s not the case (e.g. a consumer charging their EV via a regular plug). This will allows various use cases to be built and tested against the sandbox. The work for incorporating the EV charger and battery time series into the sandbox is in progress.
For the dataset to be rich, Awesense took one step further and produced meter time series for voltage levels and energy. Currently, the formula for generating the voltage time series is simplified, where the voltage time series is a function of the net flow. We fully understand that this is not a realistic representation, and we plan to make the voltage on the consumption point more realistic.
The result of the above efforts is realistic meter time series with hourly granularity for the entire sandbox distribution grid area.
For real-world grids, measurements are also typically taken at the top of feeders by SCADA devices. So we generate synthetic time series for these locations as well. Since such devices should, in principle, reflect the total consumption in the feeder, we create their time series by adding the previously-generated time series from all meters and adding some noise to reflect potential losses, energy theft or other inaccuracies.
Let us know if you would like more detailed information about how we generated any of the mentioned data sets or, more importantly, if you would like to access the Awesense sandbox environment and try exploring the data or develop your own use case.
Please contact our technical sales team for a demonstration!