Source code for t1modeler.com datasets preparation
The scripts in this repository faciliate the following tasks | 本代码仓库中的脚本完成以下任务:
- download data file from one of the various web pages | 从各种不同的数据页面中下载原始文件
- convert the data into pandas dataframe and binarize the target variable | 将文件中的数据转换为 pandas 数据集并创建目标变量
- save the dataframe as CSV file which is ready for modeling on t1modeler.com | 将数据集保存为 CSV 文件,压缩后可上传至 t1modeler.com 进行模型开发
Find the source page for each script in the table below | 表格内容为脚本与数据页面的对应关系
| # | File Name | Source Page |
|---|---|---|
| 1 | keel_001_kdd_cup_1999.py | Link |
| 2 | keel_002_sonar_mines_vs_rocks.py | Link |
| 3 | keel_003_molecular_biology.py | Link |
| 4 | keel_004_connect_4.py | Link |
| 5 | uci_001_adult_data_set.py | Link |
| 6 | uci_002_bank_marketing.py | Link |
| 7 | uci_003_human_activity_recognition.py | Link |
| 8 | uci_004_credit_approval.py | Link |
| 9 | uci_005_cylinder_bands.py | Link |
| 10 | uci_006_internet_advertisements.py | Link |
| 11 | uci_007_ionosphere.py | Link |
| 12 | uci_008_letter_recognition.py | Link |
| 13 | uci_009_multiple_features.py | Link |
| 14 | uci_010_mushroom.py | Link |
| 15 | uci_011_spambase.py | Link |
| 16 | uci_012_insurance_company_benchmark.py | Link |
| 17 | uci_013_german_credit_data.py | Link |
| 18 | uci_014_secom.py | Link |
| 19 | uci_015_qsar_biodegradation.py | Link |
| 20 | uci_016_seismic_bumps.py | Link |
| 21 | uci_017_thoracic_surgery_data.py | Link |
| 22 | uci_018_phishing_websites.py | Link |
| 23 | uci_019_default_of_credit_card_clients.py | Link |
| 24 | uci_020_sports_articles_objectivity.py | Link |
| 25 | uci_021_heart_disease.py | Link |
| 26 | uci_022_dermatology.py | Link |
| 27 | uci_023_madelon.py | Link |
| 28 | uci_024_ozone_level_detection.py | Link |
| 29 | uci_025_parkinsons.py | Link |
| 30 | uci_026_cardiotocography.py | Link |
| 31 | uci_027_miniboone_particle_identification.py | Link |
| 32 | uci_028_gas_sensor_array_drift.py | Link |
| 33 | uci_029_cnae_9.py | Link |
| 34 | uci_030_climate_model_simulation_crashes.py | Link |
| 35 | uci_031_eeg_eye_state.py | Link |
| 36 | uci_032_lsvt_voice_rehabilitation.py | Link |
| 37 | uci_033_urban_land_cover.py | Link |
| 38 | uci_034_diabetes_130_us_hospitals.py | Link |
| 39 | uci_035_gesture_phase_segmentation.py | Link |
| 40 | uci_036_student_performance.py | Link |
| 41 | uci_037_sensorless_drive_diagnosis.py | Link |
| 42 | uci_038_tv_news_channel_commercial_detection.py | Link |
| 43 | uci_039_diabetic_retinopathy_debrecen.py | Link |
| 44 | uci_040_online_news_popularity.py | Link |
| 45 | uci_041_mice_protein_expression.py | Link |
| 46 | uci_042_occupancy_detection.py | Link |
| 47 | uci_043_gas_sensors_for_home_activity.py | Link |
| 48 | uci_044_polish_companies_bankruptcy.py | Link |
| 49 | uci_045_htru2.py | Link |
| 50 | uci_046_cervical_cancer.py | Link |
| 51 | uci_047_epileptic_seizure_recognition.py | Link |
| 52 | uci_048_burst_header_packet.py | Link |
| 53 | uci_049_extention_of_z_alizadeh_sani.py | Link |
| 54 | uci_050_ida2016challenge.py | Link |
| 55 | uci_051_hcc_survival.py | Link |
| 56 | uci_052_online_shoppers_purchasing_intention.py | Link |
| 57 | uci_053_electrical_grid_stability.py | Link |
| 58 | uci_054_caesarian_section_classification.py | Link |
| 59 | uci_055_audit_data.py | Link |
| 60 | uci_056_hepatitis_c_virus.py | Link |
| 61 | uci_057_glass_identification.py | Link |
| 62 | uci_058_iris.py | Link |
| 63 | uci_059_optical_recognition_of_handwritten_digits.py | Link |
| 64 | vanderbilt_001_titanic.py | Link |
| 65 | vanderbilt_002_acute_bacterial_meningitis.py | Link |
| 66 | vanderbilt_003_ari_dataset.py | Link |
| 67 | vanderbilt_004_duchenne_muscular_dystrophy.py | Link |
| 68 | vanderbilt_005_right_heart_catheterization.py | Link |
| 69 | vanderbilt_006_ucla_stress_echocardiography.py | Link |
| 70 | vanderbilt_007_support_study.py | Link |
| 71 | vanderbilt_008_very_low_birth_weight_infants.py | Link |