Learning Database and Basic SQL-Part 2
Yesterday, I learned about how to make my own database and tables. It started when I wanted to practice analyzing data in Excel with the dataset that I have got from Kaggle. But, the second I open the file, I realized that it’s easier for me to analyze this dataset with SQL. So, I decided to learn SQL again, after days or weeks I didn’t.
As a start, I uninstalled the SQL editor I had before, which was Beekeeper Studio and began using Visual Studio Code instead. After completing the installation process, it’s time for me to connect VSCode with MariaDB, the SQL server. Following that, I created the database for the dataset I own using this simple syntax:
create database spotify23;
This dataset contains songs that were most streamed in 2023. Then, I used that spotify23 database to make some tables, one for the songs and the other one for the charts. And the syntax was:
use spotify23;
create table track_song(
id int primary key,
track_name varchar(40),
artist_names varchar(50),
released_year varchar(4),
released_month varchar(2),
released_day varchar(2)
)
create table track_charts(
id int primary key,
in_spotify_playlists char(10),
in_spotify_charts char(10),
streams char(10),
in_apple_playlists char(10),
in_apple_charts char(10),
in_deezer_playlists char(10),
in_deezer_charts char(10),
in_shazam_charts char(10)
)
After the tables have been created, it’s time for me to fill them with data. But, I need to modify the data first. Since the format is .CSV, of course the data looks like this.
I tried to import this dataset into SQL server with LOAD DATA statement but it didn’t work for some reason. Fortunately, I found another way to fill the tables with this data by using this INSERT INTO statement:
INSERT INTO track_song (id, track_name, artist_names, artist_count, released_year, released_month, released_day) VALUES
('31', 'Rush', 'Troye Sivan', '1', '2023', '7', '13'),
('32', 'TULUM', 'Peso Pluma, Grupo Frontera', '2', '2023', '6', '28'),
('33', 'Creepin', 'The Weeknd, 21 Savage, Metro Boomin', '3', '2022', '12', '2');
It’s actually way longer than that and I wrote some mistakes in the process that caused the syntax error. With a very long code, I couldn’t detect the mistakes even though the VSCode already indicated me the error locations. It was a bit frustrating, since I had no one to ask to help me, especially someone who understands SQL. However, I managed to resolve it by asking ChatGPT to detect the syntax errors.
It turned out the error was in the string “Creepin” that contains single quote and it caused the error. From that, I knew that I had to erase all of words that contain single quote to prevent the same syntax error. So I did it, manually. Here’s the final result of track_song table.
After the track_song table was filled with the data, I continued to fill the track_charts table with the syntax:
insert into track_charts (id, in_spotify_playlists, in_spotify_charts, streams, in_apple_playlists, in_apple_charts, in_deezer_playlists, in_deezer_charts, in_shazam_charts) values
('1','553','147','141381703','43','263','45','10','826'),
('2','1474','48','133716286','48','126','58','14','382'),
('3','1397','113','140003974','94','207','91','14','949');
It was way more easier than the track_song table because it only contains numbers. And here’s the final result of track_charts table:
Besides these two tables, I actually still have to make some tables for other categories before I start to analyze, but I will continue it on the next sessions.
See you!