How to import web server logs with MSSQL

  • Once you have imported the web logs into an SQL Server database, you can then analyse them, archive them...
  • Microsoft has created this great gem: logparser. See: Microsoft LogParser
  • LogParser reads, writes and converts not only from but also to all kind of logs.
  • I use LogParser interactively see: Microsoft LogParser and to create CVS files that can easily be imported in SQL Server.

Applies to:

  • Microsoft SQL Server 2005
  • Microsoft SQL Server 2008

SQL Server logs data structure

  1. CREATE TABLE logs
  2. (
  3.     logfilename VARCHAR(255) null,
  4.     logrow INT null,
  5.     remotehostname VARCHAR(255) null,
  6.     remotelogname VARCHAR(255) null,
  7.     username VARCHAR(255) null,
  8.     DATETIME DATETIME null,
  9.     request VARCHAR(1024) null,
  10.     statuscode INT null,
  11.     bytessent INT null,
  12.     referer VARCHAR(1024) null,
  13.     useragent VARCHAR(1024) null,
  14.     cookie VARCHAR(1024) null
  15.     )  ON [PRIMARY]
  16. go
  • The remotehostnamem, remotelogname and username are defined as varchar(255) because some people will stuff string to try to crash your web server.
  • The request, referer, useragent and cookie are defined as varchar(1024) because some people will stuff string to try to crash your web server.
  • Please remember that the data import will abort with error if the data imported is larger wider that the table definition.

LogParser exporting data to CSV format

  1. LogParser.exe -i:NCSA  "select logfilename, logrow, remotehostname, remotelogname, username, datetime, request, statuscode, bytessent, referer, user-agent, cookie from \sql\access_log.20070908011201" -o:csv >  \SQL\output01.csv
LogFilename,LogRow,RemoteHostName,RemoteLogName,UserName,DateTime,Request,StatusCode,BytesSent,Referer,User-Agent,Cookie
C:\sql\access_log.20070908011201,1,911networks.com,64.124.85.77,- -,2007-09-02 11:07:21,GET /index.php/Cisco/RedistDistLists HTTP/1.1,200,10717,,Mozilla/5.0 (compatible; BecomeBot/3.0; +http://www.become.com/site_owners.html),
C:\sql\access_log.20070908011201,2,sqlhacks.com,142.166.3.123,- -,2007-09-02 11:10:48,GET /index.php/Retrieve/String-Operations HTTP/1.1,200,15157,,Jakarta Commons-HttpClient/3.1-beta1,
C:\sql\access_log.20070908011201,3,sqlhacks.com,142.166.3.123,- -,2007-09-02 11:10:48,GET /index.php/Optimize/Truncate-Delete HTTP/1.1,200,10248,,Jakarta Commons-HttpClient/3.1-beta1,

...

C:\sql\access_log.20070908011201,17526,911networks.com,209.85.238.3,- -,2007-09-08 07:54:31,GET /taxonomy/term/2/all/feed HTTP/1.1,404,222,,Feedfetcher-Google; (+http://www.google.com/feedfetcher.html),
C:\sql\access_log.20070908011201,17527,sqlhacks.com,65.55.208.21,- -,2007-09-08 07:56:50,GET /index.php/FAQs/FAQs HTTP/1.0,200,10393,,msnbot/1.0 (+http://search.msn.com/msnbot.htm),
C:\sql\access_log.20070908011201,17528,911networks.com,74.6.19.103,- -,2007-09-08 08:00:54,GET /index.php/Cisco/ISDNNotes HTTP/1.0,200,10841,,Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp),
C:\sql\access_log.20070908011201,17529,sqlhacks.com,142.166.3.123,- -,2007-09-08 08:04:09,GET /index.php/Site/AllRecentChanges?action=rss HTTP/1.1,200,4108,,radianrss-1.0,
  • I have truncated the CSV file, it has almost 17,500 rows.

Import CSV format into log table

  1. BULK INSERT logs
  2. FROM 'c:\sql\output01.csv'
  3. WITH
  4. (
  5.     firstrow = 2,
  6.     fieldterminator = ',',
  7.     batchsize = 100,
  8.     rowterminator = '\n'
  9. );
  10. go
  • Line 5: firstrow = 2 means to ignore line 1. Line one has the column names.
  • Line 7: batchsize = 100 means that SQL Server will issue a commit every 100 rows. If you do not do that, SQL Server will do the whole file import as a single transaction. So if there is a problem with any row imported, it will roll back the whole import.

Data

  1. SELECT CAST(RemoteHostName AS VARCHAR(32)) AS 'Destination',
  2.        CAST(RemoteLogName AS VARCHAR(32)) AS 'Source',
  3.        CAST(UserName  AS VARCHAR(32)) AS 'Logged as',
  4.        DATETIME AS 'Accessed',
  5.        CAST(Request AS VARCHAR(96)) AS 'Http request',
  6.        StatusCode,
  7.        BytesSent,
  8.        CAST(Referer AS VARCHAR(64)) AS 'Came from',
  9.        CAST(UserAgent AS VARCHAR(64)) AS 'Agent',
  10.        CAST(Cookie AS VARCHAR(48)) AS 'Cookie'
  11. FROM logs;
  12. go
Destination                      Source                           Logged as                        Accessed                Http request                                                                                     StatusCode  BytesSent   Came from                                                        Agent                                                            Cookie
-------------------------------- -------------------------------- -------------------------------- ----------------------- ------------------------------------------------------------------------------------------------ ----------- ----------- ---------------------------------------------------------------- ---------------------------------------------------------------- ------------------------------------------------
911networks.com                  64.124.85.77                     - -                              2007-09-02 11:07:21.000 GET /index.php/Cisco/RedistDistLists HTTP/1.1                                                    200         10717       NULL                                                             Mozilla/5.0 (compatible; BecomeBot/3.0; +http://www.become.com/s NULL
sqlhacks.com                     142.166.3.123                    - -                              2007-09-02 11:10:48.000 GET /index.php/Retrieve/String-Operations HTTP/1.1                                               200         15157       NULL                                                             Jakarta Commons-HttpClient/3.1-beta1                             NULL
sqlhacks.com                     142.166.3.123                    - -                              2007-09-02 11:10:48.000 GET /index.php/Optimize/Truncate-Delete HTTP/1.1                                                 200         10248       NULL                                                             Jakarta Commons-HttpClient/3.1-beta1                             NULL
911networks.com                  213.140.22.73                    - -                              2007-09-02 11:17:05.000 GET /taxonomy/term/2+3+4+5+6+7+8+9+10+11+15+16/0/feed HTTP/1.1                                   404         246         NULL                                                             Feedreader 3.10 (Powered by Newsbrain)                           NULL
sqlhacks.com                     59.93.7.244                      - -                              2007-09-02 11:21:43.000 GET /pub/skins/fixflow/css/pm-core.css HTTP/1.1                                                  200         2112        http://www.sqlhacks.com/index.php/Summary/Multiple_Group_By      Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.6) Geck NULL
sqlhacks.com                     59.93.7.244                      - -                              2007-09-02 11:21:41.000 GET /index.php/Summary/Multiple_Group_By HTTP/1.1                                                200         36068       http://www.google.com/search?q=column+is+invalid+in+the+select+l Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.6) Geck NULL
sqlhacks.com                     59.93.7.244                      - -                              2007-09-02 11:21:44.000 GET /pub/wsplus/wsplus.css HTTP/1.1                                                              200         3393        http://www.sqlhacks.com/index.php/Summary/Multiple_Group_By      Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.6) Geck NULL
sqlhacks.com                     59.93.7.244                      - -                              2007-09-02 11:21:45.000 GET /pub/skins/fixflow/fontsizer.js HTTP/1.1                                                     200         4317        http://www.sqlhacks.com/index.php/Summary/Multiple_Group_By      Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.6) Geck NULL
sqlhacks.com                     59.93.7.244                      - -                              2007-09-02 11:21:45.000 GET /pub/skins/fixflow/css/layout-fixflow.css HTTP/1.1                                           200         2001        http://www.sqlhacks.com/index.php/Summary/Multiple_Group_By      Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.6) Geck NULL

...

sqlhacks.com                     217.146.125.43                   - -                              2007-09-08 07:27:38.000 GET /pub/wsplus/tip.gif HTTP/1.0                                                                 200         1276        http://www.sqlhacks.com/pub/wsplus/wsplus.css                    Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.8.1.6) Gecko/2007 NULL
sqlhacks.com                     217.146.125.43                   - -                              2007-09-08 07:27:44.000 GET /index.php/Summary/Summary HTTP/1.0                                                          200         11256       http://www.sqlhacks.com/index.php/FAQs/Single_User_vs_Restricted Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.8.1.6) Gecko/2007 NULL
sqlhacks.com                     65.55.208.23                     - -                              2007-09-08 07:29:23.000 GET /index.php/About/Credits?action=browse HTTP/1.0                                              200         8872        NULL                                                             msnbot/1.0 (+http://search.msn.com/msnbot.htm)                   NULL
sqlhacks.com                     65.55.208.23                     - -                              2007-09-08 07:29:37.000 GET /index.php/About/Guaranties?action=browse HTTP/1.0                                           200         11500       NULL                                                             msnbot/1.0 (+http://search.msn.com/msnbot.htm)                   NULL
sqlhacks.com                     65.55.165.16                     - -                              2007-09-08 07:30:28.000 GET /index.php/About/Credits?action=browse HTTP/1.0                                              200         8872        http://search.live.com/results.aspx?q=sqlhacks&mrt=en-us&FORM=LI Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.2; .NET CLR 1.1. NULL
sqlhacks.com                     65.55.165.16                     - -                              2007-09-08 07:30:29.000 GET /pub/skins/fixflow/css/pm-core.css HTTP/1.0                                                  200         2112        http://www.sqlhacks.com/index.php/About/Credits?action=browse    Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.2; .NET CLR 1.1. NULL
sqlhacks.com                     65.55.165.16                     - -                              2007-09-08 07:30:29.000 GET /pub/wsplus/wsplus.css HTTP/1.0                                                              200         3393        http://www.sqlhacks.com/index.php/About/Credits?action=browse    Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.2; .NET CLR 1.1. NULL
sqlhacks.com                     65.55.165.16                     - -                              2007-09-08 07:30:30.000 GET /pub/skins/fixflow/fontsizer.js HTTP/1.0                                                     200         4317        http://www.sqlhacks.com/index.php/About/Credits?action=browse    Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.2; .NET CLR 1.1. NULL
sqlhacks.com                     65.55.165.16                     - -                              2007-09-08 07:30:30.000 GET /pub/skins/fixflow/css/layout-fixflow.css HTTP/1.0                                           200         2001        http://www.sqlhacks.com/index.php/About/Credits?action=browse    Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.2; .NET CLR 1.1. NULL
sqlhacks.com                     65.55.165.16                     - -                              2007-09-08 07:30:30.000 GET /pub/skins/fixflow/css/layout-main.css HTTP/1.0                                              200         6192        http://www.sqlhacks.com/index.php/About/Credits?action=browse    Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.2; .NET CLR 1.1. NULL
sqlhacks.com                     65.55.165.16                     - -                              2007-09-08 07:30:30.000 GET /pub/skins/fixflow/css/layout-ie.css HTTP/1.0                                                200         1052        http://www.sqlhacks.com/index.php/About/Credits?action=browse    Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.2; .NET CLR 1.1. NULL
sqlhacks.com                     65.55.165.16                     - -                              2007-09-08 07:30:30.000 GET /pub/skins/fixflow/css/layout-print.css HTTP/1.0                                             200         981         http://www.sqlhacks.com/index.php/About/Credits?action=browse    Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.2; .NET CLR 1.1. NULL
sqlhacks.com                     65.55.165.16                     - -                              2007-09-08 07:30:30.000 GET /pub/skins/fixflow/css/font-verdana.css HTTP/1.0                                             200         2752        http://www.sqlhacks.com/index.php/About/Credits?action=browse    Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.2; .NET CLR 1.1. NULL

(17499 row(s) affected)
  • I have truncated the output, it has almost 17,500 rows.