Friday 19 September 2014

Duplicate records

Create a table EmpDtl1 with some duplicate rows as shown below to understand different methods of delete duplicate rows.
    create table EmpDup(empid int,name varchar(20))
    
    insert into EmpDup values(1,'Andy')
    insert into EmpDup values (1,'Andy')
    insert into EmpDup values(2,'Bill')
    insert into EmpDup values(2,'Bill')
    insert into EmpDup values (2,'Bill')
    insert into EmpDup values (3,'Chris')
    
User Name "Andy" repeated 2 times and User Name "Bill" repeated 3 times.
Following are the different methods for deleting duplicate rows.
Method 1:
Insert the distinct rows from the duplicate rows table to new temporary table. Delete data from table which has duplicate rows then insert the distinct rows from the temporary table as shown below.
    select distinct * into #tmp From EmpDup
    delete from EmpDup
    insert into EmpDup                
    select * from #tmp drop table #tmp
    
Method 2:
If you want to consider only few columns in a table for duplication criteria to delete rows then Method 1 will not work(in our example, if EMDup table has more than 2 columns and delete rows if empid and name repeats more than one time).
In this case, Add an identity column as a serial number that acts as a row unique identifier(auto incremental ascending order).Then get the Rank against each empid,name. If Rank is greater than 1 means it is a duplicate row and delete the same. After deleting the duplicated rows, remove the identity column which is used for rank. See the below example.
    alter table EmpDup add  sno int identity(1,1)

    delete E
    from  EmpDup E
    inner join
    (select *,
    RANK() OVER ( PARTITION BY empid,name ORDER BY sno DESC )rank
    From EmpDup )T on E.sno=t.sno
    where T.Rank>1

    alter table EmpDup 
    drop  column sno
    
If you feel this query is little difficult to understand then use following same query but in different way!
    alter table EmpDup add  sno int identity(1,1)
    delete from EmpDup where sno in
    (
    select sno from (
    select *,
    RANK() OVER ( PARTITION BY empid,name ORDER BY sno DESC )rank
    From EmpDup
    )T
    where rank>1
    )

    alter table EmpDup 
    drop  column sno
    
If your sql server version is below SQL2005 then above queries (above 2 queries in Method2) won't work since Rank() command is not available in SQL 2000 and below versions.
If your sql server version is below SQL2005 then use below query. Same logic to get the rank but without using the Rank() command!
    alter table EmpDup add  sno int identity(1,1)
    delete from EmpDup where sno in
    (
    select sno  from EmpDup D where 
    1<(select count(*) from EmpDup A where A.empid=D.empid and A.name=D.name and D.sno>=A.sno)
    )

    alter table EmpDup
    drop  column sno
    
Method 3:
Using "Delete Top( )" clause:
If you want to delete duplicate rows for a particular empid then use "Top()" command in delete query as shown below.
    delete top(2) From EmpDup where empid=2
OR
    delete top(select count(*)-1 From EmpDup x where x.empid=2) From EmpDup where empid=2
    
Method 4:
If you want to delete all the rows if the selected columns repeated more than 1 time then use below query.
Query to delete 3 duplicated rows (in our example table) or repeated more than 1 time.
    delete from EmpDup where EmpID in(select EmpID from EmpDup group by EmpId having
    count(*) >1)

Thursday 11 September 2014

Some Queries About CTE

For selecting dept wise total number of employees

SELECT DEPTNO,COUNT(*) as no_of_employees FROM EMP GROUP BY DEPTNO

For selecting deptname and total number of employees in a dept

;WITH CTE AS (SELECT DEPTNO,COUNT(1)EMPCOUNT FROM EMP GROUP BY DEPTNO)
SELECT D.DNAME,D.DEPTNO,C.EMPCOUNT FROM DEPT D INNER JOIN CTE C ON C.DEPTNO=D.DEPTNO

Selecting hierarchy

;WITH A AS (SELECT * FROM EMP WHERE mgr is null
UNION ALL
SELECT E.* FROM emp e INNER JOIN A ON E.mgr=A.empno
)select * from A

For selecting 2 highest salary

SELECT TOP 1 SAL FROM (SELECT TOP 2 sal from emp ORDER BY SAL asc ) E ORDER BY SAL desc

Selecting top2 salaries

select top 2 sal from emp order by sal asc

Selecting top 1 sal from each dept

;WITH D AS(SELECT *,ROW_NUMBER() OVER (PARTITION BY DEPTNO ORDER BY SAL DESC ) RNO FROM EMP)
SELECT * FROM D WHERE RNO=1

Selecting 5,6 highest sal 

;with x as (select rank()over (order by sal desc) salary ,* from emp)
select * from x where salary in (5,6)