Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CLN: ASV io benchmarks #18906

Merged
merged 4 commits into from
Dec 23, 2017
Merged

CLN: ASV io benchmarks #18906

merged 4 commits into from
Dec 23, 2017

Conversation

mroeschke
Copy link
Member

xref

Consolidated the benchmarks from io_sql.py, hdfstore_bench.py, and packers.py into their own files in the io folder. Benchmarks are largely the same as they were, just cleaned and simplified where able.

asv dev -b ^io
· Discovering benchmarks
· Running 67 total benchmarks (1 commits * 1 environments * 67 benchmarks)
[  0.00%] ·· Building for existing-py_home_matt_anaconda_envs_pandas_dev_bin_python
[  0.00%] ·· Benchmarking existing-py_home_matt_anaconda_envs_pandas_dev_bin_python
[  1.49%] ··· Running io.csv.ReadCSVCategorical.time_convert_direct                                                                   98.5ms
[  2.99%] ··· Running io.csv.ReadCSVCategorical.time_convert_post                                                                      142ms
[  4.48%] ··· Running io.csv.ReadCSVComment.time_comment                                                                              53.6ms
[  5.97%] ··· Running io.csv.ReadCSVDInferDatetimeFormat.time_read_csv                                                                    ok
[  5.97%] ···· 
               ======================= ======== ========= ========
               --                                 format          
               ----------------------- ---------------------------
                infer_datetime_format   custom   iso8601    ymd   
               ======================= ======== ========= ========
                         True           24.4ms    4.73ms   4.99ms 
                        False           662ms     3.50ms   3.23ms 
               ======================= ======== ========= ========

[  7.46%] ··· Running io.csv.ReadCSVFloatPrecision.time_read_csv                                                                          ok
[  7.46%] ···· 
               ===== ========== ========== ================ ========== ========== ================
               --                              decimal / float_precision                          
               ----- -----------------------------------------------------------------------------
                sep   . / None   . / high   . / round_trip   _ / None   _ / high   _ / round_trip 
               ===== ========== ========== ================ ========== ========== ================
                 ,     3.91ms     3.77ms        5.52ms        4.25ms     4.19ms        4.20ms     
                 ;     3.86ms     3.73ms        5.56ms        4.16ms     4.23ms        4.13ms     
               ===== ========== ========== ================ ========== ========== ================

[  8.96%] ··· Running io.csv.ReadCSVFloatPrecision.time_read_csv_python_engine                                                            ok
[  8.96%] ···· 
               ===== ========== ========== ================ ========== ========== ================
               --                              decimal / float_precision                          
               ----- -----------------------------------------------------------------------------
                sep   . / None   . / high   . / round_trip   _ / None   _ / high   _ / round_trip 
               ===== ========== ========== ================ ========== ========== ================
                 ,     8.54ms     8.44ms        8.50ms        7.03ms     6.99ms        6.99ms     
                 ;     8.46ms     8.53ms        8.42ms        7.00ms     7.09ms        7.00ms     
               ===== ========== ========== ================ ========== ========== ================

[ 10.45%] ··· Running io.csv.ReadCSVParseDates.time_baseline                                                                          2.86ms
[ 11.94%] ··· Running io.csv.ReadCSVParseDates.time_multiple_date                                                                     2.85ms
[ 13.43%] ··· Running io.csv.ReadCSVSkipRows.time_skipprows                                                                               ok
[ 13.43%] ···· 
               ========== ========
                skiprows          
               ---------- --------
                  None     46.1ms 
                 10000     31.4ms 
               ========== ========

[ 14.93%] ··· Running io.csv.ReadCSVThousands.time_thousands                                                                              ok
[ 14.93%] ···· 
               ===== ======== ========
               --        thousands    
               ----- -----------------
                sep    None      ,    
               ===== ======== ========
                 ,    38.6ms   35.6ms 
                 |    38.9ms   38.6ms 
               ===== ======== ========

[ 16.42%] ··· Running io.csv.ReadUint64Integers.time_read_uint64                                                                      8.95ms
[ 17.91%] ··· Running io.csv.ReadUint64Integers.time_read_uint64_na_values                                                            13.0ms
[ 19.40%] ··· Running io.csv.ReadUint64Integers.time_read_uint64_neg_values                                                           13.0ms
[ 20.90%] ··· Running io.csv.S3.time_read_csv_10_rows                                                                                     ok
[ 20.90%] ···· 
               ============= ======== =======
               --                 engine     
               ------------- ----------------
                compression   python     c   
               ============= ======== =======
                    None      8.43s    6.27s 
                    gzip      6.80s    7.74s 
                    bz2       36.6s     n/a  
               ============= ======== =======

[ 22.39%] ··· Running io.csv.ToCSV.time_frame                                                                                             ok
[ 22.39%] ···· 
               ======= ========
                 kind          
               ------- --------
                 wide   89.1ms 
                 long   172ms  
                mixed   38.6ms 
               ======= ========

[ 23.88%] ··· Running io.csv.ToCSVDatetime.time_frame_date_formatting                                                                 23.1ms
[ 25.37%] ··· Running io.excel.Excel.time_read_excel                                                                                      ok
[ 25.37%] ···· 
               ============ =======
                  engine           
               ------------ -------
                 openpyxl    565ms 
                xlsxwriter   569ms 
                   xlwt      158ms 
               ============ =======

[ 26.87%] ··· Running io.excel.Excel.time_write_excel                                                                                     ok
[ 26.87%] ···· 
               ============ =======
                  engine           
               ------------ -------
                 openpyxl    1.24s 
                xlsxwriter   970ms 
                   xlwt      704ms 
               ============ =======

[ 28.36%] ··· Running io.hdf.HDF.time_read_hdf                                                                                            ok
[ 28.36%] ···· 
               ======== ========
                format          
               -------- --------
                table    63.0ms 
                fixed    79.8ms 
               ======== ========

[ 29.85%] ··· Running io.hdf.HDF.time_write_hdf                                                                                           ok
[ 29.85%] ···· 
               ======== =======
                format         
               -------- -------
                table    121ms 
                fixed    149ms 
               ======== =======

[ 31.34%] ··· Running io.hdf.HDFStoreDataFrame.time_query_store_table                                                                 26.4ms
[ 32.84%] ··· Running io.hdf.HDFStoreDataFrame.time_query_store_table_wide                                                            33.2ms
[ 34.33%] ··· Running io.hdf.HDFStoreDataFrame.time_read_store                                                                        12.1ms
[ 35.82%] ··· Running io.hdf.HDFStoreDataFrame.time_read_store_mixed                                                                  25.4ms
[ 37.31%] ··· Running io.hdf.HDFStoreDataFrame.time_read_store_table                                                                  18.0ms
[ 38.81%] ··· Running io.hdf.HDFStoreDataFrame.time_read_store_table_mixed                                                            36.6ms
[ 40.30%] ··· Running io.hdf.HDFStoreDataFrame.time_read_store_table_wide                                                             30.1ms
[ 41.79%] ··· Running io.hdf.HDFStoreDataFrame.time_store_info                                                                        53.2ms
[ 43.28%] ··· Running io.hdf.HDFStoreDataFrame.time_store_repr                                                                         112μs
[ 44.78%] ··· Running io.hdf.HDFStoreDataFrame.time_store_str                                                                          109μs
[ 46.27%] ··· Running io.hdf.HDFStoreDataFrame.time_write_store                                                                       13.6ms
[ 47.76%] ··· Running io.hdf.HDFStoreDataFrame.time_write_store_mixed                                                                 31.6ms
[ 49.25%] ··· Running io.hdf.HDFStoreDataFrame.time_write_store_table                                                                 45.1ms
[ 50.75%] ··· Running io.hdf.HDFStoreDataFrame.time_write_store_table_dc                                                               357ms
[ 52.24%] ··· Running io.hdf.HDFStoreDataFrame.time_write_store_table_mixed                                                           56.7ms
[ 53.73%] ··· Running io.hdf.HDFStoreDataFrame.time_write_store_table_wide                                                             160ms
[ 55.22%] ··· Running io.hdf.HDFStorePanel.time_read_store_table_panel                                                                55.2ms
[ 56.72%] ··· Running io.hdf.HDFStorePanel.time_write_store_table_panel                                                               90.3ms
[ 58.21%] ··· Running io.json.ReadJSON.time_read_json                                                                                     ok
[ 58.21%] ···· 
               ========= ======= ==========
               --              index       
               --------- ------------------
                 orient    int    datetime 
               ========= ======= ==========
                 split    253ms    270ms   
                 index    7.80s    7.97s   
                records   619ms    627ms   
               ========= ======= ==========

[ 59.70%] ··· Running io.json.ReadJSONLines.peakmem_read_json_lines                                                                       ok
[ 59.70%] ···· 
               ========== ======
                 index          
               ---------- ------
                  int      192M 
                datetime   192M 
               ========== ======

[ 61.19%] ··· Running io.json.ReadJSONLines.peakmem_read_json_lines_concat                                                                ok
[ 61.19%] ···· 
               ========== ======
                 index          
               ---------- ------
                  int      164M 
                datetime   164M 
               ========== ======

[ 62.69%] ··· Running io.json.ReadJSONLines.time_read_json_lines                                                                          ok
[ 62.69%] ···· 
               ========== =======
                 index           
               ---------- -------
                  int      734ms 
                datetime   740ms 
               ========== =======

[ 64.18%] ··· Running io.json.ReadJSONLines.time_read_json_lines_concat                                                                   ok
[ 64.18%] ···· 
               ========== =======
                 index           
               ---------- -------
                  int      767ms 
                datetime   763ms 
               ========== =======

[ 65.67%] ··· Running io.json.ToJSON.time_delta_int_tstamp                                                                                ok
[ 65.67%] ···· 
               ========= =======
                 orient         
               --------- -------
                 split    347ms 
                columns   353ms 
                 index    400ms 
               ========= =======

[ 67.16%] ··· Running io.json.ToJSON.time_delta_int_tstamp_lines                                                                          ok
[ 67.16%] ···· 
               ========= =======
                 orient         
               --------- -------
                 split    634ms 
                columns   643ms 
                 index    632ms 
               ========= =======

[ 68.66%] ··· Running io.json.ToJSON.time_float_int                                                                                       ok
[ 68.66%] ···· 
               ========= =======
                 orient         
               --------- -------
                 split    232ms 
                columns   233ms 
                 index    389ms 
               ========= =======

[ 70.15%] ··· Running io.json.ToJSON.time_float_int_lines                                                                                 ok
[ 70.15%] ···· 
               ========= =======
                 orient         
               --------- -------
                 split    684ms 
                columns   686ms 
                 index    685ms 
               ========= =======

[ 71.64%] ··· Running io.json.ToJSON.time_float_int_str                                                                                   ok
[ 71.64%] ···· 
               ========= =======
                 orient         
               --------- -------
                 split    354ms 
                columns   231ms 
                 index    411ms 
               ========= =======

[ 73.13%] ··· Running io.json.ToJSON.time_float_int_str_lines                                                                             ok
[ 73.13%] ···· 
               ========= =======
                 orient         
               --------- -------
                 split    713ms 
                columns   713ms 
                 index    714ms 
               ========= =======

[ 74.63%] ··· Running io.json.ToJSON.time_floats_with_dt_index                                                                            ok
[ 74.63%] ···· 
               ========= =======
                 orient         
               --------- -------
                 split    191ms 
                columns   222ms 
                 index    220ms 
               ========= =======

[ 76.12%] ··· Running io.json.ToJSON.time_floats_with_dt_index_lines                                                                      ok
[ 76.12%] ···· 
               ========= =======
                 orient         
               --------- -------
                 split    531ms 
                columns   527ms 
                 index    533ms 
               ========= =======

[ 77.61%] ··· Running io.json.ToJSON.time_floats_with_int_idex_lines                                                                      ok
[ 77.61%] ···· 
               ========= =======
                 orient         
               --------- -------
                 split    525ms 
                columns   525ms 
                 index    524ms 
               ========= =======

[ 79.10%] ··· Running io.json.ToJSON.time_floats_with_int_index                                                                           ok
[ 79.10%] ···· 
               ========= =======
                 orient         
               --------- -------
                 split    167ms 
                columns   183ms 
                 index    194ms 
               ========= =======

[ 80.60%] ··· Running io.msgpack.MSGPack.time_read_msgpack                                                                            48.2ms
[ 82.09%] ··· Running io.msgpack.MSGPack.time_write_msgpack                                                                           73.5ms
[ 83.58%] ··· Running io.pickle.Pickle.time_read_pickle                                                                                110ms
[ 85.07%] ··· Running io.pickle.Pickle.time_write_pickle                                                                               157ms
[ 86.57%] ··· Running io.sas.SAS.time_read_msgpack                                                                                        ok
[ 86.57%] ···· 
               ========== ========
                 format           
               ---------- --------
                sas7bdat   608ms  
                 xport     9.92ms 
               ========== ========

[ 88.06%] ··· Running io.sql.ReadSQLTable.time_read_sql_table_all                                                                         ok
[ 88.06%] ···· 
               ================ =======
                    dtype              
               ---------------- -------
                    float        101ms 
                float_with_nan   101ms 
                    string       101ms 
                     bool        102ms 
                     int         101ms 
                   datetime      101ms 
               ================ =======

[ 89.55%] ··· Running io.sql.ReadSQLTable.time_read_sql_table_column                                                                      ok
[ 89.55%] ···· 
               ================ ========
                    dtype               
               ---------------- --------
                    float        24.1ms 
                float_with_nan   23.0ms 
                    string       25.1ms 
                     bool        23.8ms 
                     int         25.6ms 
                   datetime      46.6ms 
               ================ ========

[ 91.04%] ··· Running io.sql.ReadSQLTable.time_read_sql_table_parse_dates                                                                 ok
[ 91.04%] ···· 
               ================ ========
                    dtype               
               ---------------- --------
                    float        31.5ms 
                float_with_nan   31.9ms 
                    string       31.0ms 
                     bool        30.9ms 
                     int         31.2ms 
                   datetime      31.5ms 
               ================ ========

[ 92.54%] ··· Running io.sql.SQL.time_read_sql_query_select_all                                                                           ok
[ 92.54%] ···· 
               ============ ======== ================ ======== ======== ======== ==========
               --                                        dtype                             
               ------------ ---------------------------------------------------------------
                connection   float    float_with_nan   string    bool     int     datetime 
               ============ ======== ================ ======== ======== ======== ==========
                sqlalchemy   70.6ms       71.0ms       70.8ms   70.4ms   70.6ms    71.5ms  
                  sqlite     58.7ms       58.1ms       58.0ms   57.7ms   58.5ms    58.2ms  
               ============ ======== ================ ======== ======== ======== ==========

[ 94.03%] ··· Running io.sql.SQL.time_read_sql_query_select_column                                                                        ok
[ 94.03%] ···· 
               ============ ======== ================ ======== ======== ======== ==========
               --                                        dtype                             
               ------------ ---------------------------------------------------------------
                connection   float    float_with_nan   string    bool     int     datetime 
               ============ ======== ================ ======== ======== ======== ==========
                sqlalchemy   71.6ms       70.6ms       71.5ms   70.8ms   70.8ms    70.4ms  
                  sqlite     58.4ms       58.8ms       58.4ms   57.8ms   57.7ms    58.1ms  
               ============ ======== ================ ======== ======== ======== ==========

[ 95.52%] ··· Running io.sql.SQL.time_to_sql_dataframe_colums                                                                             ok
[ 95.52%] ···· 
               ============ ======== ================ ======== ======== ======== ==========
               --                                        dtype                             
               ------------ ---------------------------------------------------------------
                connection   float    float_with_nan   string    bool     int     datetime 
               ============ ======== ================ ======== ======== ======== ==========
                sqlalchemy   140ms        149ms        136ms    148ms    135ms     257ms   
                  sqlite     54.4ms       63.2ms       55.5ms   92.4ms   52.7ms    118ms   
               ============ ======== ================ ======== ======== ======== ==========

[ 97.01%] ··· Running io.sql.SQL.time_to_sql_dataframe_full                                                                               ok
[ 97.01%] ···· 
               ============ ======= ================ ======== ======= ======= ==========
               --                                      dtype                            
               ------------ ------------------------------------------------------------
                connection   float   float_with_nan   string    bool    int    datetime 
               ============ ======= ================ ======== ======= ======= ==========
                sqlalchemy   435ms       440ms        433ms    436ms   436ms    435ms   
                  sqlite     185ms       186ms        185ms    186ms   185ms    186ms   
               ============ ======= ================ ======== ======= ======= ==========

[ 98.51%] ··· Running io.stata.Stata.time_read_stata                                                                                      ok
[ 98.51%] ···· 
               =============== =======
                convert_dates         
               --------------- -------
                      tc        514ms 
                      td        510ms 
                      tm        1.35s 
                      tw        1.27s 
                      th        1.36s 
                      tq        1.35s 
                      ty        1.76s 
               =============== =======

[100.00%] ··· Running io.stata.Stata.time_write_stata                                                                                     ok
[100.00%] ···· 
               =============== =======
                convert_dates         
               --------------- -------
                      tc        593ms 
                      td        592ms 
                      tm        603ms 
                      tw        1.34s 
                      th        624ms 
                      tq        616ms 
                      ty        592ms 
               =============== =======

@codecov
Copy link

codecov bot commented Dec 22, 2017

Codecov Report

Merging #18906 into master will decrease coverage by 0.05%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #18906      +/-   ##
==========================================
- Coverage   91.65%   91.59%   -0.06%     
==========================================
  Files         154      150       -4     
  Lines       51368    48939    -2429     
==========================================
- Hits        47080    44826    -2254     
+ Misses       4288     4113     -175
Flag Coverage Δ
#multiple 89.95% <ø> (+0.43%) ⬆️
#single 41.14% <ø> (+0.29%) ⬆️
Impacted Files Coverage Δ
pandas/io/excel.py
pandas/io/stata.py
pandas/io/pickle.py
pandas/io/sql.py

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 316acbf...d24bc6b. Read the comment docs.

read_sql_query(self.query_all, self.con)

def time_read_sql_query_select_column(self, connection, dtype):
read_sql_query(self.query_all, self.con)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

query_col ?


goal_time = 0.2
params = (['sqlalchemy', 'sqlite'],
['float', 'float_with_nan', 'string', 'bool', 'int', 'datetime'])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would personally keep this benchmark splitted into two classes where one has those params (for testing types) and the other not. As now this creates a lot of extra benchmarks that are not needed

@jorisvandenbossche jorisvandenbossche added the Benchmark Performance (ASV) benchmarks label Dec 22, 2017
@jorisvandenbossche jorisvandenbossche added this to the 0.23.0 milestone Dec 22, 2017
@mroeschke
Copy link
Member Author

Fixed the incorrect benchmark and split up the SQL benchmark to avoid redundant tests.

@jreback jreback merged commit b9cc821 into pandas-dev:master Dec 23, 2017
@jreback
Copy link
Contributor

jreback commented Dec 23, 2017

thanks!

@mroeschke mroeschke deleted the asv_clean_io branch December 24, 2017 05:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Benchmark Performance (ASV) benchmarks
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants