Cleaning up legacy data in SQL Server

 

If you want to import old data into a new database in order to clean it up, you can use stored procedures and cursors. For example, you might need to do this if you've got a database with phone numbers entered in a variety of formats and want to convert them all to a single, consistent format. To illustrate how to transfer and clean up data, we're going to use the pubs sample database (included with SQL Server). As an example, we're going to show you how you could strip out the dashes in the au_id column of the authors table. As part of this process, we'll need to copy the au_id, au_lname, and au_fname columns to another table; in our example, we used a table name of authorIDs. Here are the steps:

 

1. Create the authorIDs table:

CREATE TABLE dbo.authorIDs (

au_id id NOT NULL ,

au_lname varchar (40) NOT NULL ,

au_fname varchar (20) NOT NULL ,

)

GO

2. Next, copy the data from the authors table to authorIDs:

insert authorids(au_id, au_fname, au_lname)

select au_id, au_fname, au_lname from authors

 

3. Create the following stored procedure to strip out the

dashes in the au_id column (but don't run it yet):

CREATE PROCEDURE scrub_au_ID

@au_id char(11),

@au_id_new varchar(11) output

AS

set nocount on

declare @counter int

select @au_id_new=""

select @counter =1

while @counter <11

begin

if ascii(substring(@au_id, @counter,1)) >47 and ascii(substring(@au_id,

@counter,1))<58

select @au_id_new = @au_id_new + substring(@au_id,@counter,1)

select @counter = @counter +1

end

GO

 

4. Now create the following stored procedure (again, don't run

it yet). This procedure creates a cursor of the author IDs,

calls the scrub procedure, and then updates the authorIDs table

based on the cursor position:

CREATE PROCEDURE auth_ids

AS

DECLARE @au_id varchar(11)

DECLARE @au_id_new varchar(11)

DECLARE authids_cursor CURSOR FOR SELECT au_id FROM authorids

OPEN authids_cursor

FETCH NEXT FROM authids_cursor INTO @au_id

WHILE (@@FETCH_STATUS <> -1)

BEGIN

IF (@@FETCH_STATUS <> -2)

BEGIN

execute scrub_au_id @au_id, @au_id_new output

/*update authorids set au_id=@au_id_new where au_id=@au_id*/

update authorids set au_id=@au_id_new WHERE CURRENT OF

authids_cursor

END

FETCH NEXT FROM authids_cursor INTO @au_id

END

DEALLOCATE authids_cursor

GO

 

5. You're now ready to clean up your data. Run the following commands:

execute auth_ids

select * from authorids

  

SQL Index

Main Index

Search RD Techbase