In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2024-11-12 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/03 Report--
It is often said that the IO performance of the database is poor, but then again, there is no perceptual knowledge. Let's actually test the JDBC read performance of the commonly used Oracle and MySQL.
JDBC is tested because most applications are written by JAVA, so you can only use JDBC to access data. Here, only the test uses JDBC to read out the data and produces a record object into Java (after all, it can only be used in the application at this stage) without any calculation.
1. Data source
Use the data generated by TPCH, select the customer table to do the test, the data record is 30 million rows, 8 fields. The original text file it generates is called customer.tbl with a file size of 4.9G. Use the data import tool provided by the database to import the file data into the data tables of Oracle and MySQL.
two。 Test environment
Complete the test on an Intel server, 2 Intel2670 CPU, main frequency 2.6 G, a total of 16 cores, 64 GB of memory. Database table data and text files are stored on the same SSD hard disk.
All the tests are done on the local server and do not consume network transmission time.
3. Database reading test
Through the JDBC interface provided by Oracle, data reading is performed with SQL statements.
Java is troublesome to write. Run the test with a SPL script:
A
1=now () / recording time 2=connect ("oracle") / connection data 3=A2.cursor ("select * from customer") / generated fetch cursor 4for A3Magic 10000 / cycle fetch, 10000 5=A2.close () each time / close connection 6=interval@s (A1PowerNow ()) / Compute duration
The test code for MySQL is similar and will not be dwelt on.
Test results (time unit: seconds)
First and second rows per second Oracle293281106KMySQL51838179K
The second time may be because the operating system has a hard disk cache, so it is faster. Since our main purpose is to test the read time of JDBC, the second time prevails to reduce the impact of the database itself from the hard disk reading. The number of rows read per second is also calculated according to the second time, that is, Oracle can read more than 100000 rows per second, and the MySQL is about 80, 000 rows. Of course, this value is related to the number and type of fields in the table (the customer table has eight fields), and is just a reference.
4. Text file comparison
Just from the amount of data above, there is not much perceptual knowledge, let's read the text file again to compare. The method is the same: read the data from the file and parse the records without making any calculations.
Write the following SPL script to execute the test:
A
1=now () / record time 2=file ("/ home/sjr/tbl/customer.tbl") / generate file object 3=A2.cursor (;, "|") / generate fetch cursors, delimited by | 4for A3Power10000 / cycle fetch, 10000 5=interval@s (A1 home/sjr/tbl/customer.tbl ()) / computing duration
The test result is 42 seconds!
This means that reading text is 281 Oracle 42 times faster than MySQL 6.69 times faster than reading text, and 381 picks 9.07 times faster than MySQL!
We know that text parsing is a very troublesome thing, but even so, reading data from a text file is much faster than reading from a database. The IO of Oracle and MySQL is so slow!
5. Binary mode
Let's take a closer look at the read performance of the binary storage format and compare it with the text.
For the sake of comparison, this time switch to a larger table, using the orders table in TPCH, with 300 million rows of data and nine fields.
The code for text reading is similar to the above, with a read time test of 438 seconds.
Then, we convert this text file into a SPL group table, and then write code to test:
A
1=now () / record time 2=file ("/ home/sjr/ctx/orders.ctx"). Create () / generate group table object 3=A2.cursor () / generate number cursor 4for A3LI 10000 / cycle fetch, 10000 5=interval@s each time (A1 minute now ()) / length of computation
The test result is 164s, which is only about 1/3 of the text read.
This makes sense, because binary data no longer needs to be parsed, objects can be generated directly, and the amount of computation is much less, so it is faster.
It should be noted that although the group table file uses the column storage format, it reads all the columns here, takes nothing less than the text, and does not take advantage of the column storage. In fact, you will suffer a bit from using column storage because you read all the columns, and it will be faster if you use SPL set files (a row storage format).
6. Parallel speed increase
Fetching numbers from files is also easy to achieve parallel, text and group tables are easy to write parallel programs. Still use the above orders table as an example to test, using 4 threads to fetch the number.
Text fetch code:
The number of parallelism = now () 2=file ("/ home/sjr/tpch_2_17_0/tbls/orders.tbl") is the number of parallelism of ABC1 > nym4portan.
3fork to (n) = A2.cursor (; A3 n, "|") Multithreading generates cursors, and each cursor takes only 4 out of 4 segments.
For B3, 10000
5=interval@s (C1 focus now ())
Group table fetch code:
The number of parallelism = now () 2=file ("/ home/sjr/ctx/orders.ctx") .create ()
3fork to (n) = A2.cursor (;; A3purn) Multithreading generates cursors, each of which takes only 4 out of 4 segments
For B3, 10000
5=interval@s (C1 focus now ())
It is easy to implement data segmentation and parallel computing with SPL.
The test results are as follows:
Text 119 seconds
Group table for 43 seconds
Compared with serial, it is close to linear promotion and makes full use of the multi-core of CPU.
The data in the database is not easy to implement segmented parallelism and needs to be spelled with WHERE conditions. It is difficult to tell whether the parallelism is weak or the loss of WHERE execution is too much, so the reference significance of the test results is reduced, so we will no longer do it here.
7. Conclusion
The JDBC performance of the database (Oracle and MySQL) is very poor! It's five times worse than a text file. When binary data is used, the reading performance is 3 times higher than that of text. In other words, binaries in a reasonable format will have more than 15 times the advantage over the database. Taking into account the parallelism factor, it is also possible to be dozens or hundreds of times faster than the database.
When you are concerned about performance and a large amount of data, do not read the data out of the database for calculation!
If you really need to read out and then calculate (sometimes it is difficult for SQL to write complex process calculations), do not use the database to store (big data is all history, basically no longer change, can be read in advance), text is better than the database, binary is certainly better (it is recommended to use SPL group tables, ). Don't waste your time on non-computational tasks such as reading.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.