One of our customers has recently implemented ELK for log analysis and they wanted to pull data from an activity table called
t_activity from all production databases. You might ask what application activity is doing inside a relational database :) We agree and they are working on moving it away. These tables are huge and grow at frenetic pace and we were looking for ways to spool.
JDBC input plugin allows connecting to relational databases and execute SQL queries at regular intervals to pull data, but I didn't like the idea of a log analysis system talking to production databases, even if they used a
NOLOCK. Besides, cron jobs have to run on logstash server for polling the table and if we decide to disable polling for a particular server, the configuration changes require logstash restart.
We finally decided to run a SQL Agent job on the production servers to do a differential load based on activity id at regular intervals. We needed to create tab delimited output files from these huge tables without locking the table and causing performance issues.
bcp is the perfect tool for this job. It can create tab delimited files or with custom delimiters and load data pretty fast. Below are steps for configuring this job on an imaginary database/table.
Running a full table load each time is not a good idea, so we needed a temp table to maintain state for differential loads.
#Create a table for storing LoadStatus CREATE TABLE [dbo].[dba_LoadStatus]( [id] [int] NOT NULL ) ON [PRIMARY] GO
Create a stored procedure that looks up the load status during each job run and performs a differential load. At the end of the load, populate
MAX(id) from the table to state table for next run.
USE [Aurora] GO SET ANSI_NULLS ON GO SET QUOTED_IDENTIFIER ON GO CREATE PROCEDURE [dbo].[dba_activityload_elk] AS DECLARE @newloadid int DECLARE @oldloadid int SELECT @newloadid = MAX(id) from t_Activity SELECT @oldloadid= MAX(id) from dba_LoadStatus SELECT dbo.f_UtcToPacificDate(timestamp) AS TimeStamp, isdeleted, id, type, sessionid, '[' + username + ']' AS username, '[' + campaignname + ']' AS CampaignName, '[' + message + ']' AS Message, '[' + datum + ']' AS Datum FROM t_activity WHERE id > @oldloadid INSERT INTO dba_LoadStatus VALUES (@newloadid) GO
The next step is a bit tricky and some of you might not like it. Enable
xp_cmdshell - well, I know its evil and a cute kitten dies each time it runs, but it gets the job done here.
EXEC sp_configure 'show advanced options', 1; GO RECONFIGURE; GO EXEC sp_configure 'xp_cmdshell', 1; GO RECONFIGURE; GO
Now we need a job that runs every 5 min, spools data only if new rows were added since the last run, created new tab delimited files on each run. Here we go:
DECLARE @OutputFile NVARCHAR(100), @FilePath NVARCHAR(100), @bcpCommand NVARCHAR(1000), @newloadid INT, @oldloadid INT SELECT @newloadid = Max(id) FROM [castanet].[dbo].[t_activity] SELECT @oldloadid = Max(id) FROM [castanet].[dbo].[dba_loadstatus] IF @newloadid = @oldloadid BEGIN RAISERROR( 'Error! No new activity records were added since the last run. Skipping this load' ,1,1) GOTO EndMe END SET @bcpCommand = 'bcp "exec [Aurora].dbo.dba_activityload_elk " queryout ' SET @FilePath = 'D:\Logs\' SET @OutputFile = 'dev-engineactivity' + '-' + CONVERT(VARCHAR(500), Getdate(), 112) + '-' + Replace(CONVERT (VARCHAR(8), Getdate(), 108), ':', '') + '.log' SET @bcpCommand = @bcpCommand + @FilePath + @OutputFile + ' -c -T -S' + @@servername EXEC master..Xp_cmdshell @bcpCommand EndMe:
This script can be scheduled as a job.
Now that we have files, how do we hook them up with ELK? You can use FileBeat as a shipper for this log data.
Hope this helps someone!