Removing duplicates from a query

message from Support on 10 May 2004
I have a query that looks at linked tables and outputs customers who have
bought certain brands. The query works but I am getting duplicate results
where the customer has bought more than one product of the same brand.
What would be the best method of removing the duplicates from the result so
that it just shows one instance of the customer? For this query, I am not
interested in the actual product bought, just the customer ID. I have a
field withn the reults table showing the customer_id.
I've run the simple find duplicates query where it is searching for a
duplicate customer_id but this isn't really what I want as it shows which
ones are dupes. What I'd like is for the output to show me my original
query showing just one instance of the customer_id. If further info is
needed, please feel free to ask!
thanks
 
=?Utf-8?B?Q29ycmluYQ==?= replied to Support on 10 May 2004
1. In the Query Properties, for Unique Values, select Yes.
2. Click the Totals button to show the Total row. The Total row will say "Group By" for each field of the query.

If you want the customer to show up once and only once, even if they've bought more than one of the brands you are selecting for, click the Totals button to show the Total row. In the field for the brand (and any fields referring to the table with the brand) in the Total row, select "First".
 
Support replied to =?Utf-8?B?Q29ycmluYQ==?= on 11 May 2004
"If you want the customer to show up once and only once, even if they've
bought more than one of the brands you are selecting for, click the Totals
button to show the Total row. In the field for the brand (and any fields
referring to the table with the brand) in the Total row, select "First"."
However I ran into some problems. I ran the query with this in and
retrieved 105 results. I then ran the same query without the totals in and
retrieved 215 results.
I decided to check this and using the duplicates wizard, I counted the
number of duplicates which only came to 38 so somewhere it is not seeing 60
odd records. Do you know why this might be? Here are the 2 queries:

With totals on (ATB):

SELECT tblFinal.c_Title, tblFinal.c_FirstName, tblFinal.c_LastName,
tblFinal.c_Email, First(tblAllorderlines.l_OrderLineProductDescription) AS
FirstOfl_OrderLineProductDescription, tblFinal.c_SourceCustomerId
FROM (tblFinal INNER JOIN tblOrders ON tblFinal.c_SourceCustomerId =
tblOrders.o_SourceCustomerID) INNER JOIN tblAllorderlines ON
tblOrders.o_OrderNumber = tblAllorderlines.l_OrderNumber
GROUP BY tblFinal.c_Title, tblFinal.c_FirstName, tblFinal.c_LastName,
tblFinal.c_Email, tblFinal.c_SourceCustomerId
HAVING (((First(tblAllorderlines.l_OrderLineProductDescription)) Like
"*scrub*" Or (First(tblAllorderlines.l_OrderLineProductDescription)) Like
"*grit*"));

Without totals(ATB No totals):

SELECT tblFinal.c_Title, tblFinal.c_FirstName, tblFinal.c_LastName,
tblFinal.c_Email, tblAllorderlines.l_OrderLineProductDescription,
tblFinal.c_SourceCustomerId
FROM (tblFinal INNER JOIN tblOrders ON
tblFinal.c_SourceCustomerId=tblOrders.o_SourceCustomerID) INNER JOIN
tblAllorderlines ON tblOrders.o_OrderNumber=tblAllorderlines.l_OrderNumber
WHERE (((tblAllorderlines.l_OrderLineProductDescription) Like "*scrub*" Or
(tblAllorderlines.l_OrderLineProductDescription) Like "*grit*"));

And the duplicates query (which is ran on the no totals query directly
above):

SELECT [ATB No totals].c_SourceCustomerId, [ATB No totals].c_FirstName, [ATB
No totals].c_LastName
FROM [ATB No totals]
WHERE ((([ATB No totals].c_SourceCustomerId) In (SELECT [c_SourceCustomerId]
FROM [ATB No totals] As Tmp GROUP BY [c_SourceCustomerId] HAVING
Count(*)>1 )))
ORDER BY [ATB No totals].c_SourceCustomerId;

thanks
 
Support replied to Support on 17 May 2004
Thanks again Corrina,
that's really helped me understand what is actually going on within the
queries.
 
=?Utf-8?B?Q29ycmluYQ==?= replied to Support on 11 May 2004
If I understand you correctly, your results are actually pretty logical. The query tool generates an SQL statement (the actual query language) from what it sees in the query tool. When you take out the Total row, you eliminate the "Group By" clause and the "First" aggregate function on the l_OrderLineProductDescription field from the resulting SQL statement. (FYI - in SQL, you aren't allowed to have an aggregate function like "First" without the "Group By" clause on all non-aggregated fields.)

This means two things are actually happening, only one of which your duplicates query is going to show. Removing the "First" function allows a customer that bought both "*scrub*" and "*grit*" to show up twice, once for each brand. If you remove the "First" function in the Totals row (just change it to "Group By") and run the duplicates query, you shouldn't get any duplicates. Removing the "Group By" clause (by removing/hiding the Totals row) allows a customer that bought a brand multiple times to show up multiple times in the query, which the duplicates query will show.

So you're 60-odd missing records are likely customers that bought both "*scrub*" and "*grit*".

Corrina Burnley
Burnley Data Solutions
www.burnleydatasolutions.com
 

Archived message: Removing duplicates from a query (Microsoft Access)