I have read some CSV vs database debates and in many cases people have recommended DB solution on CSV. Although this has never happened to me a single setup.
So here's the setup - Approximately 50 CSV files representing approximately 50 hostels per hour are from 20 to 100 counters per display group - I have several predefined reports (like some counter and nodes Need to extract data to create a daily) - It should be relatively static - based on variable date duration, host, counter - I need to add data if needed Need to find (in total of 50 files) about 100 MB per day - possible
possible solution?
1) Keep it in CSV - To create a master CSV file for each display group and simply attach the latest CSV file every hour. Shell command (grep, sed, cut, awk)
2) Load it into database (like MySQL) - To create display tables to mirror display groups and load those CSV files in the table. To generate your report using SQL query
When I tried to simulate and used only shell commands on CSV files and it was very fast I worry that database queries will be slowed ( On the amount of data Four) I also know that the database does not like too much tables - in my scenario I will need 100+ columns in some cases, it will only be read for most of the time (only adding new files) I would like to keep data for the year so that it is about 36 GB. Database solution is still fine performance (1-2 core VM, 2-4 GB memory expected) I did not simulate database solution, so I want to ask you if you have any view / experience of similar scenario.
Thanks
No comments:
Post a Comment