How to import web server logs with MSSQL
- Once you have imported the web logs into an SQL Server database, you can then analyse them, archive them...
- Microsoft has created this great gem: logparser. See: Microsoft LogParser
- LogParser reads, writes and converts not only from but also to all kind of logs.
- I use LogParser interactively see: Microsoft LogParser and to create CVS files that can easily be imported in SQL Server.
Applies to:
- Microsoft SQL Server 2005
- Microsoft SQL Server 2008
SQL Server logs data structure
- CREATE TABLE logs
- (
- logfilename VARCHAR(255) null,
- logrow INT null,
- remotehostname VARCHAR(255) null,
- remotelogname VARCHAR(255) null,
- username VARCHAR(255) null,
- DATETIME DATETIME null,
- request VARCHAR(1024) null,
- statuscode INT null,
- bytessent INT null,
- referer VARCHAR(1024) null,
- useragent VARCHAR(1024) null,
- cookie VARCHAR(1024) null
- ) ON [PRIMARY]
- go
- The remotehostnamem, remotelogname and username are defined as varchar(255) because some people will stuff string to try to crash your web server.
- The request, referer, useragent and cookie are defined as varchar(1024) because some people will stuff string to try to crash your web server.
- Please remember that the data import will abort with error if the data imported is larger wider that the table definition.
LogParser exporting data to CSV format
- LogParser.exe -i:NCSA "select logfilename, logrow, remotehostname, remotelogname, username, datetime, request, statuscode, bytessent, referer, user-agent, cookie from \sql\access_log.20070908011201" -o:csv > \SQL\output01.csv
LogFilename,LogRow,RemoteHostName,RemoteLogName,UserName,DateTime,Request,StatusCode,BytesSent,Referer,User-Agent,Cookie C:\sql\access_log.20070908011201,1,911networks.com,64.124.85.77,- -,2007-09-02 11:07:21,GET /index.php/Cisco/RedistDistLists HTTP/1.1,200,10717,,Mozilla/5.0 (compatible; BecomeBot/3.0; +http://www.become.com/site_owners.html), C:\sql\access_log.20070908011201,2,sqlhacks.com,142.166.3.123,- -,2007-09-02 11:10:48,GET /index.php/Retrieve/String-Operations HTTP/1.1,200,15157,,Jakarta Commons-HttpClient/3.1-beta1, C:\sql\access_log.20070908011201,3,sqlhacks.com,142.166.3.123,- -,2007-09-02 11:10:48,GET /index.php/Optimize/Truncate-Delete HTTP/1.1,200,10248,,Jakarta Commons-HttpClient/3.1-beta1, ... C:\sql\access_log.20070908011201,17526,911networks.com,209.85.238.3,- -,2007-09-08 07:54:31,GET /taxonomy/term/2/all/feed HTTP/1.1,404,222,,Feedfetcher-Google; (+http://www.google.com/feedfetcher.html), C:\sql\access_log.20070908011201,17527,sqlhacks.com,65.55.208.21,- -,2007-09-08 07:56:50,GET /index.php/FAQs/FAQs HTTP/1.0,200,10393,,msnbot/1.0 (+http://search.msn.com/msnbot.htm), C:\sql\access_log.20070908011201,17528,911networks.com,74.6.19.103,- -,2007-09-08 08:00:54,GET /index.php/Cisco/ISDNNotes HTTP/1.0,200,10841,,Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp), C:\sql\access_log.20070908011201,17529,sqlhacks.com,142.166.3.123,- -,2007-09-08 08:04:09,GET /index.php/Site/AllRecentChanges?action=rss HTTP/1.1,200,4108,,radianrss-1.0,
- I have truncated the CSV file, it has almost 17,500 rows.
Import CSV format into log table
- BULK INSERT logs
- FROM 'c:\sql\output01.csv'
- WITH
- (
- firstrow = 2,
- fieldterminator = ',',
- batchsize = 100,
- rowterminator = '\n'
- );
- go
- Line 5: firstrow = 2 means to ignore line 1. Line one has the column names.
- Line 7: batchsize = 100 means that SQL Server will issue a commit every 100 rows. If you do not do that, SQL Server will do the whole file import as a single transaction. So if there is a problem with any row imported, it will roll back the whole import.
Data
- SELECT CAST(RemoteHostName AS VARCHAR(32)) AS 'Destination',
- CAST(RemoteLogName AS VARCHAR(32)) AS 'Source',
- CAST(UserName AS VARCHAR(32)) AS 'Logged as',
- DATETIME AS 'Accessed',
- CAST(Request AS VARCHAR(96)) AS 'Http request',
- StatusCode,
- BytesSent,
- CAST(Referer AS VARCHAR(64)) AS 'Came from',
- CAST(UserAgent AS VARCHAR(64)) AS 'Agent',
- CAST(Cookie AS VARCHAR(48)) AS 'Cookie'
- FROM logs;
- go
Destination Source Logged as Accessed Http request StatusCode BytesSent Came from Agent Cookie -------------------------------- -------------------------------- -------------------------------- ----------------------- ------------------------------------------------------------------------------------------------ ----------- ----------- ---------------------------------------------------------------- ---------------------------------------------------------------- ------------------------------------------------ 911networks.com 64.124.85.77 - - 2007-09-02 11:07:21.000 GET /index.php/Cisco/RedistDistLists HTTP/1.1 200 10717 NULL Mozilla/5.0 (compatible; BecomeBot/3.0; +http://www.become.com/s NULL sqlhacks.com 142.166.3.123 - - 2007-09-02 11:10:48.000 GET /index.php/Retrieve/String-Operations HTTP/1.1 200 15157 NULL Jakarta Commons-HttpClient/3.1-beta1 NULL sqlhacks.com 142.166.3.123 - - 2007-09-02 11:10:48.000 GET /index.php/Optimize/Truncate-Delete HTTP/1.1 200 10248 NULL Jakarta Commons-HttpClient/3.1-beta1 NULL 911networks.com 213.140.22.73 - - 2007-09-02 11:17:05.000 GET /taxonomy/term/2+3+4+5+6+7+8+9+10+11+15+16/0/feed HTTP/1.1 404 246 NULL Feedreader 3.10 (Powered by Newsbrain) NULL sqlhacks.com 59.93.7.244 - - 2007-09-02 11:21:43.000 GET /pub/skins/fixflow/css/pm-core.css HTTP/1.1 200 2112 http://www.sqlhacks.com/index.php/Summary/Multiple_Group_By Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.6) Geck NULL sqlhacks.com 59.93.7.244 - - 2007-09-02 11:21:41.000 GET /index.php/Summary/Multiple_Group_By HTTP/1.1 200 36068 http://www.google.com/search?q=column+is+invalid+in+the+select+l Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.6) Geck NULL sqlhacks.com 59.93.7.244 - - 2007-09-02 11:21:44.000 GET /pub/wsplus/wsplus.css HTTP/1.1 200 3393 http://www.sqlhacks.com/index.php/Summary/Multiple_Group_By Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.6) Geck NULL sqlhacks.com 59.93.7.244 - - 2007-09-02 11:21:45.000 GET /pub/skins/fixflow/fontsizer.js HTTP/1.1 200 4317 http://www.sqlhacks.com/index.php/Summary/Multiple_Group_By Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.6) Geck NULL sqlhacks.com 59.93.7.244 - - 2007-09-02 11:21:45.000 GET /pub/skins/fixflow/css/layout-fixflow.css HTTP/1.1 200 2001 http://www.sqlhacks.com/index.php/Summary/Multiple_Group_By Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.6) Geck NULL ... sqlhacks.com 217.146.125.43 - - 2007-09-08 07:27:38.000 GET /pub/wsplus/tip.gif HTTP/1.0 200 1276 http://www.sqlhacks.com/pub/wsplus/wsplus.css Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.8.1.6) Gecko/2007 NULL sqlhacks.com 217.146.125.43 - - 2007-09-08 07:27:44.000 GET /index.php/Summary/Summary HTTP/1.0 200 11256 http://www.sqlhacks.com/index.php/FAQs/Single_User_vs_Restricted Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.8.1.6) Gecko/2007 NULL sqlhacks.com 65.55.208.23 - - 2007-09-08 07:29:23.000 GET /index.php/About/Credits?action=browse HTTP/1.0 200 8872 NULL msnbot/1.0 (+http://search.msn.com/msnbot.htm) NULL sqlhacks.com 65.55.208.23 - - 2007-09-08 07:29:37.000 GET /index.php/About/Guaranties?action=browse HTTP/1.0 200 11500 NULL msnbot/1.0 (+http://search.msn.com/msnbot.htm) NULL sqlhacks.com 65.55.165.16 - - 2007-09-08 07:30:28.000 GET /index.php/About/Credits?action=browse HTTP/1.0 200 8872 http://search.live.com/results.aspx?q=sqlhacks&mrt=en-us&FORM=LI Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.2; .NET CLR 1.1. NULL sqlhacks.com 65.55.165.16 - - 2007-09-08 07:30:29.000 GET /pub/skins/fixflow/css/pm-core.css HTTP/1.0 200 2112 http://www.sqlhacks.com/index.php/About/Credits?action=browse Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.2; .NET CLR 1.1. NULL sqlhacks.com 65.55.165.16 - - 2007-09-08 07:30:29.000 GET /pub/wsplus/wsplus.css HTTP/1.0 200 3393 http://www.sqlhacks.com/index.php/About/Credits?action=browse Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.2; .NET CLR 1.1. NULL sqlhacks.com 65.55.165.16 - - 2007-09-08 07:30:30.000 GET /pub/skins/fixflow/fontsizer.js HTTP/1.0 200 4317 http://www.sqlhacks.com/index.php/About/Credits?action=browse Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.2; .NET CLR 1.1. NULL sqlhacks.com 65.55.165.16 - - 2007-09-08 07:30:30.000 GET /pub/skins/fixflow/css/layout-fixflow.css HTTP/1.0 200 2001 http://www.sqlhacks.com/index.php/About/Credits?action=browse Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.2; .NET CLR 1.1. NULL sqlhacks.com 65.55.165.16 - - 2007-09-08 07:30:30.000 GET /pub/skins/fixflow/css/layout-main.css HTTP/1.0 200 6192 http://www.sqlhacks.com/index.php/About/Credits?action=browse Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.2; .NET CLR 1.1. NULL sqlhacks.com 65.55.165.16 - - 2007-09-08 07:30:30.000 GET /pub/skins/fixflow/css/layout-ie.css HTTP/1.0 200 1052 http://www.sqlhacks.com/index.php/About/Credits?action=browse Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.2; .NET CLR 1.1. NULL sqlhacks.com 65.55.165.16 - - 2007-09-08 07:30:30.000 GET /pub/skins/fixflow/css/layout-print.css HTTP/1.0 200 981 http://www.sqlhacks.com/index.php/About/Credits?action=browse Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.2; .NET CLR 1.1. NULL sqlhacks.com 65.55.165.16 - - 2007-09-08 07:30:30.000 GET /pub/skins/fixflow/css/font-verdana.css HTTP/1.0 200 2752 http://www.sqlhacks.com/index.php/About/Credits?action=browse Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.2; .NET CLR 1.1. NULL (17499 row(s) affected)
- I have truncated the output, it has almost 17,500 rows.

