

The Synthetic Data Vault Project was first created at MIT's Data to AI Lab in 2016. Visit the SDGym Documentation to learn more!

my_datasets_folder = 's3://my-datasets-bucket'įor more information, see the docs for Customized Datasets. You can also include any custom, private datasets that are stored on your computer on anĪmazon S3 bucket. get_available_datasets () dataset_name size_MB num_tables List these using the get_available_datasets feature. The SDGym library includes many publicly available datasets that you can include right away. Learn more in the Custom Synthesizers Guide. def my_training_logic ( data, metadata ): # create an object to represent your synthesizer # train it using the data return synthesizer def my_sampling_logic ( trained_synthesizer, num_rows ): # use the trained synthesizer to create # num_rows of synthetic data return synthetic_data Specifying the training logic (using machine learning) and the sampling logic. Supplying a custom synthesizerīenchmark your own synthetic data generation techniques. On a variety of publicly available datasets. The result is a detailed performance, memory and quality evaluation across the synthesizers benchmark_single_table ( synthesizers = ( sdv_synthesizers + baseline_synthesizers ) ) Now, we can benchmark the different techniques: import sdgym sdgym. # these synthesizers come from the SDV library # each one uses different modeling techniques sdv_synthesizers = # these basic synthesizers are available in SDGym # as baselines baseline_synthesizers = Let's choose a few synthesizers from the SDV library and a few others Let's benchmark synthetic data generation for single tables.
#Synthetic data generator tool install
pip install sdgymįor more information about using SDGym, visit the SDGym Documentation.
#Synthetic data generator tool software
We recommend using a virtual environment to avoid conflicts with other software on your device. Quality and privacy through a variety of metrics
