reduce databse size: Reg: Duplicates Removing Problem

I have Data like below

TableA

Col1 Col2 Col3

1 'Test One' 4/7/2007

1 'Test One-1' 4/7/2007

2 'Test' 4/7/2007

3 'Test Three' 4/7/2007

3 'Test Three-1' 4/7/2007

3 'Test Three-2' 4/7/2007

I need to delete the duplicate records[Latest record has to be remaining like that] from above table. i need the out put like below

Col1 Col2 Col3

1 'Test One-1' 4/7/2007

2 'Test' 4/7/2007

3 'Test Three-2' 4/7/2007

If any body know the solution please help me.

Regards

Hanu

You'll want something along the lines of:

SELECT Col1, MAX(Col2), MAX(Col3)

FROM Table

GROUP BY Col1

However, you will probably come into problems with the Col2 values as on character MAX finds the highest value in the collating sequence which may or may not be accurate.

To do latest record, you'd be better off using something like a datetime field (with time) and then something like this.

SELECT t.Col1, t.Col2, t.Col3

FROM Table t

INNER JOIN (SELECT Col1, MAX(Col3) AS Col3

FROM Table
GROUP BY Col1) AS Mx

ON t.Col1 = Mx.Col1 AND t.Col3 = Mx.Col3

Hope this helps!

|||

Unless you expand Col3 to a full datetime datatype column, including the time, there is no way, as you have presented the data, to determine which row of data is the 'Latest record'.

Or you could add an IDENTITY column to the table. Or some what to determine sequence of rows.

Currently, you have none.

|||I think it's worth mentioning that the results of the query richbrownesq suggested are not guaranteed to be in the original table, as MAX(Col2) and MAX(Col3) could come from different rows of the table.

If you have a column, say dateInserted, from which you can determine which row is "latest," you can do this in SQL Server 2000 [not tested in a repro]

SQL Server 2000

delete from t
where exists (
select * from t as t_copy
where t_copy.Col1 = t.Col2
and t_copy.dateInserted > t.dateInserted
)

In other words, delete all rows that are superseded by a later row for the same Col1 value.

In SQL Server 2005, it may be more efficient to do this:

SQL Server 2005

with t_ranked as (

select
*,
rank() over (
partition by Col1
order by dateInserted desc
) as rk
from T
)
delete from T_ranked
where rk > 1

Both solutions will leave multiple rows for a single Col1 if they are tied for latest dateInserted value.

Steve Kass
Drew University
http://www.stevekass.com
|||

Arnie Rowland wrote:

Unless you expand Col3 to a full datetime datatype column, including the time, there is no way, as you have presented the data, to determine which row of data is the 'Latest record'.

Or you could add an IDENTITY column to the table. Or some what to determine sequence of rows.

Currently, you have none.

On the contrary Column2 shows the chronology.

I'll post the query later. I'm in the middle of something right now.

Adamus

|||

Technically there are not duplicates with the exception of col1 + col3. Col2 makes the record unique

To get the desired resultset, grab the right(max) from Col2 (if one exists) and group Col1

Adamus

|||

While that 'may' be true, when handling the data, whether using MAX() or [ ORDER BY ], there will be significant ambiguity when the count gets up to 'Test Three-10'.

How would you 'easily' determine the sequence of the following (without having to parse)?

'Test Three-1'

'Test Three-10'

'Test Three-11'

'Test Three-2'

'Test Three-3'

'Test Three-4'

'Test Three-5'

'Test Three-6'

'Test Three-7'

'Test Three-8'

'Test Three-9'

|||

Arnie Rowland wrote:

While that 'may' be true, when handling the data, whether using MAX() or [ ORDER BY ], there will be significant ambiguity when the count gets up to 'Test Three-10'.

How would you 'easily' determine the sequence of the following (without having to parse)?

'Test Three-1'

'Test Three-10'

'Test Three-11'

'Test Three-2'

'Test Three-3'

'Test Three-4'

'Test Three-5'

'Test Three-6'

'Test Three-7'

'Test Three-8'

'Test Three-9'

You cannot avoid parsing when dealing with strings nor should you resort to the 'easy' and quick.

Adamus

|||

Arnie Rowland wrote:
...there will be significant ambiguity...

As an FYI there is no ambiguity in uniqueness regardless of datatype. Just because a string scares you, doesn't mean you should coin it ambiguous.

Adamus

|||

Adam

It would better demonstate a desire to helpfully contribute to the forums were you to offer constructive suggestions to solve the OP's problem.

As I indicated earlier, Col2 provides no way to determine the chronology of data added to the database. The nature of the data may lead one to 'assume' a sequence, but, as indicated earlier, as the data is presented, it is not possible to determine if 'Test Three-2' is entered BEFORE or AFTER 'Test Three-2' -therefore impossible to determine which is the 'latest record'.

You are making assumptions on the OP's data that are not justified.

|||

Arnie Rowland wrote:

Adam

It would better demonstate a desire to helpfully contribute to the forums were you to offer constructive suggestions to solve the OP's problem.

As I indicated earlier, Col2 provides no way to determine the chronology of data added to the database. The nature of the data may lead one to 'assume' a sequence, but, as indicated earlier, as the data is presented, it is not possible to determine if 'Test Three-2' is entered BEFORE or AFTER 'Test Three-2' -therefore impossible to determine which is the 'latest record'.

You are making assumptions on the OP's data that are not justified.

Arnie,

While I continue to feel that you resorting the path of least resistance, there still remains a logical progression and association with the OP's table and desired resultset. I would think it is insulting not helpful to the OP to suggest his question has no solution without modifying the table.

Yes, adding an identity field is the obvious choice, but again this might not be an option.

Working with the data "provided," there is a solution. Where is the assumption?

Adamus

|||

Arnie Rowland wrote:

It would better demonstate a desire to helpfully contribute to the forums were you to offer constructive suggestions to solve the OP's problem.

Arnie said = "No Solution"

Adamus said = "Here's the solution"

Please revisit the thread to determine who is being constructive and who is being argumentative.

Thanks,

Adamus

|||

Arnie Rowland wrote:

You are making assumptions on the OP's data that are not justified.

Arnie, you are not making assumptions when working with the data provided.

You, on the other hand, are assuming that the table can be altered.

With the provided information, if you do not see a pattern, I'm not sure how you can help.

Adamus

Wednesday, March 28, 2012

Reg: Duplicates Removing Problem

No comments:

Post a Comment

reduce databse size

Blog Archive

About Me