java - Insert 8 million rows file inside a mysql db -
i have file containing search engine query log having next columns: id, timestamp, session, user, document, query, activity.
a query can nowadays several times within file, created 2 tables in mysql db:
query:
+------------------+--------------+------+-----+---------+----------------+ | field | type | null | key | default | | +------------------+--------------+------+-----+---------+----------------+ | id | int(11) | no | pri | null | auto_increment | | query | varchar(256) | yes | | null | | | interaction_freq | int(11) | yes | | null | | +------------------+--------------+------+-----+---------+----------------+
interaction:
+----------------+--------------+------+-----+---------+----------------+ | field | type | null | key | default | | +----------------+--------------+------+-----+---------+----------------+ | id | int(11) | no | pri | null | auto_increment | | interaction_id | int(11) | yes | | null | | | info | timestamp | yes | | null | | | session | varchar(256) | yes | | null | | | user | int(11) | yes | | null | | | document | int(11) | yes | | null | | | query_id | int(11) | no | mul | null | | | activity | int(11) | yes | | null | | +----------------+--------------+------+-----+---------+----------------+
the first table saves each single query , in sec 1 have info each row having particular query, referenced query_id. in first table interaction_freq (number of rows in interaction table having query) saved well.
my file contains on 8 millions rows , 1.5 1000000 of unique queries expect, @ end, first table have 1.5 1000000 rows , 8 1000000 rows sec one.
the problem insertion phase slow. process inserts first 150.000 unique queries after struggles processing others. i'm using cluster consisting of 8 8-core intel xeon nodes (32gb ram) , 18-ish avante quad core xeon 2.4/2.66ghz (8gb ram).
at origin first table had "query" field unique thought problem. removed unique constraints checking uniqueness within java programme inserting rows. didn't solve problem. tried maintain process running on 48 hours , didn't reach 200.000 rows in first table.
i guess there problem can't figure out...
i'm planning utilize xml files possible workaround having info within mysql db can come in handy...
java mysql database
No comments:
Post a Comment