Thursday, March 22, 2012

300 choose 10 - doing math with SQL Server, need some help...

Hey gang,
I have a question that may be more Math related than SQL Server related.
But maybe it's a simple thing & I'm missing it 'cause I'm trying too hard...
Here's the scenario: I have 300 products in a table. I also have last
year's sales figures, and this year's to-date sales figures. I want to
figure out which group of 10 products have the highest year-to-date sales AN
D
last year's sales combined to less than 10,000.
I have written a program that basically uses Where statements to iterate
through the 300 items, 10 times, calculating the best sales for this year
where last years sales are less than 10,000.
The problem is that there are 300 * 299 * 298 * 297 * 296 * 295 * 294 * 293
* 292 * 291 different combinations of products. That's a big freakin' numbe
r!
I've put in some minor checks to help speed things up slightly - things like
breaking where last years total is more than 10,000, etc. But she's still
gonna take about 42,000 years to get my result.
So what am I missing? What's the best way to accomplish what I'm trying to
do?
Any help would be appreciated...
Thanks,
GregGreg,
Could you post some minimal DDL for us to see? It could be a simple solution
...

> I've put in some minor checks to help speed things up slightly - things li
ke
> breaking where last years total is more than 10,000, etc. But she's still
> gonna take about 42,000 years to get my result.
As long as your billing for the processing time, I'm not sure I see the
problem with that ;-).
-- Alex|||OK - here's what I've got... I've taken out half the code to reduce the pos
t
- this assumes I'm only looking for the best THREE products. In fact, I'm
looking for ten...
I have a three user-defined functions that do basically the same thing -
look up a value in a table & return it.
CREATE FUNCTION LookupLastYear(@.ProdID int)
RETURNS money
AS
BEGIN
DECLARE @.out money
SET @.out = (SELECT LastYear FROM tblCurrentStats WHERE Prod=@.ProdID)
RETURN @.out
END
The code as it stands: (& if I screwed up with the flow by removing a huge
chunk of code, forgive me...) (As an aside, I'm sure this could have been
done using recursion, but I'm not sure how... Any tips would be
appreciated...)
Thanks in advance,
Greg
The Code:
CREATE PROCEDURE USP_Figure_Out_Best_Team AS
DECLARE @.Prod1 int
DECLARE @.Prod2 int
DECLARE @.Prod3 int
DECLARE @.LastYear1 money
DECLARE @.LastYear2 money
DECLARE @.LastYear3 money
DECLARE @.ThisYear1 money
DECLARE @.ThisYear2 money
DECLARE @.ThisYear3 money
DECLARE @.BestProd1 int
DECLARE @.BestProd2 int
DECLARE @.BestProd3 int
DECLARE @.BestTotal money
DECLARE @.Best2004 money
DECLARE @.TempTotal money
DECLARE @.TempLastYear money
SET @.BestTotal = 0
SET @.Best2004 = 0
SET @.TempTotal = 0
SET @.TempLastYear = 0
DECLARE @.Counter1 int
DECLARE @.Counter2 int
DECLARE @.Counter3 int
set @.Counter1 = 1
while @.Counter1 <=297
begin
SET @.ThisYear1 = dbo.lookupthisyear(@.counter1)
SET @.LastYear1 = dbo.lookuplastyear(@.counter1)
SET @.TempTotal = @.ThisYear1
SET @.TempLastYear = @.LastYear1
PRINT '1 ' + CONVERT(varchar, @.Counter1)
IF @.TempLastYear <=10000
BEGIN
SET @.Counter2 = @.Counter1 + 1
WHILE @.Counter2 <=297
BEGIN
SET @.ThisYear2 = dbo.lookupthisyear(@.counter2)
SET @.LastYear2 = dbo.lookuplastyear(@.counter2)
SET @.TempTotal = @.TempTotal + @.ThisYear2
SET @.TempLastYear = @.TempLastYear + @.LastYear2
PRINT '2: ' + CONVERT(varchar, @.Counter2)
IF @.TempLastYear <=10000
BEGIN
SET @.Counter3 = @.Counter2 + 1
WHILE @.Counter3 <=297
BEGIN
SET @.ThisYear3 = dbo.lookupthisyear(@.counter3)
SET @.LastYear3 = dbo.lookuplastyear(@.counter3)
SET @.TempTotal = @.TempTotal + @.ThisYear3
SET @.TempLastYear = @.TempLastYear + @.LastYear3
PRINT '3: ' + CONVERT(varchar, @.Counter3)
IF @.TempLastYear <=10000
BEGIN
if @.temptotal > @.BestTotal AND @.TempLastYear <=10000
BEGIN
PRINT ''
PRINT 'New Best'
PRINT dbo.lookupProd(@.Counter1)
PRINT dbo.lookupProd(@.Counter2)
PRINT dbo.lookupProd(@.Counter3)
PRINT '2004 Total:' + CONVERT(varchar,@.TempLastYear)
PRINT '2005 Total:' + CONVERT(varchar,@.TempTotal)
SET @.BestProd1 = @.Counter1
SET @.BestProd2 = @.Counter2
SET @.BestProd3 = @.Counter3
SET @.BestTotal = @.TempTotal
SET @.Best2004 = @.TempLastYear
INSERT INTO tblOptimal
VALUES
(GETDATE(),
dbo.lookupProd(@.Counter1),
dbo.lookupProd(@.Counter2),
dbo.lookupProd(@.Counter3),
@.Best2004,
@.BestTotal)
END
SET @.Counter3 = @.Counter3 + 1
SET @.TempTotal = @.TempTotal - @.ThisYear3
SET @.TempLastYear = @.TempLastYear - @.LastYear3
END
END
SET @.Counter2 = @.Counter2 + 1
SET @.TempTotal = @.TempTotal - @.ThisYear2
SET @.TempLastYear = @.TempLastYear - @.LastYear2
end
END
SET @.Counter1 = @.Counter1 + 1
end
PRINT 'Optimal:'
PRINT dbo.lookupProd(@.BestProd1)
PRINT dbo.lookupProd(@.BestProd2)
PRINT dbo.lookupProd(@.BestProd3)
PRINT '2004: ' + CONVERT(varchar, @.Best2004)
PRINT '2005: ' + CONVERT(varchar,@.BestTotal)
GO
"Alex Papadimoulis" wrote:

> Greg,
> Could you post some minimal DDL for us to see? It could be a simple soluti
on
> ...
>
> As long as your billing for the processing time, I'm not sure I see the
> problem with that ;-).
> -- Alex|||I'm still having a hard time understanding your schema ... post DDL for the
tables.
BUT, from what I can tell so far, it looks selecting a single value at a
time, comparing, and moving to the next row -- three levels deep.
Have you tried doing a JOIN? It would seem like a join would solve this and
let the DB optimize ...
-- Alex
"Alex Papadimoulis" wrote:

> Greg,
> Could you post some minimal DDL for us to see? It could be a simple soluti
on
> ...
>
> As long as your billing for the processing time, I'm not sure I see the
> problem with that ;-).
> -- Alex|||If I understand you correctly, why not just
Select Top 10 ProductID
From (Select ProductID, Sum(SalesAmt) TotalSales
From SalesTable
Where SalesDate >= '20040101'
Group By ProductID)
Where TotalSales < 10000
Order By TotalSales Desc
"Greg Toronto" wrote:

> Hey gang,
> I have a question that may be more Math related than SQL Server related.
> But maybe it's a simple thing & I'm missing it 'cause I'm trying too hard.
.
> Here's the scenario: I have 300 products in a table. I also have last
> year's sales figures, and this year's to-date sales figures. I want to
> figure out which group of 10 products have the highest year-to-date sales
AND
> last year's sales combined to less than 10,000.
> I have written a program that basically uses Where statements to iterate
> through the 300 items, 10 times, calculating the best sales for this year
> where last years sales are less than 10,000.
> The problem is that there are 300 * 299 * 298 * 297 * 296 * 295 * 294 * 29
3
> * 292 * 291 different combinations of products. That's a big freakin' num
ber!
> I've put in some minor checks to help speed things up slightly - things li
ke
> breaking where last years total is more than 10,000, etc. But she's still
> gonna take about 42,000 years to get my result.
> So what am I missing? What's the best way to accomplish what I'm trying t
o
> do?
> Any help would be appreciated...
> Thanks,
> Greg|||Greg,
Please post DDL, sample data and expected results, as documented in
http://www.aspfaq.com/etiquette.asp?id=5006
I will assume some simple DDL and sample data:
CREATE TABLE SalesTotals (
ProductID int,
TheYear smallint,
Total money NOT NULL,
PRIMARY KEY (TheYear, ProductID)
)
INSERT INTO SalesTotals VALUES (1, 2004, 1000)
INSERT INTO SalesTotals VALUES (2, 2004, 2000)
INSERT INTO SalesTotals VALUES (3, 2004, 3000)
INSERT INTO SalesTotals VALUES (4, 2004, 4000)
INSERT INTO SalesTotals VALUES (5, 2004, 5000)
INSERT INTO SalesTotals VALUES (6, 2004, 500)
INSERT INTO SalesTotals VALUES (7, 2004, 3500)
INSERT INTO SalesTotals VALUES (8, 2004, 4000)
INSERT INTO SalesTotals VALUES (9, 2004, 5500)
INSERT INTO SalesTotals VALUES (10,2004, 5000)
INSERT INTO SalesTotals VALUES (1, 2005,11000)
INSERT INTO SalesTotals VALUES (2, 2005, 7500)
INSERT INTO SalesTotals VALUES (3, 2005, 5000)
INSERT INTO SalesTotals VALUES (4, 2005,16000)
INSERT INTO SalesTotals VALUES (5, 2005,17000)
INSERT INTO SalesTotals VALUES (6, 2005, 2500)
INSERT INTO SalesTotals VALUES (7, 2005,23500)
INSERT INTO SalesTotals VALUES (8, 2005,25500)
INSERT INTO SalesTotals VALUES (9, 2005,11500)
INSERT INTO SalesTotals VALUES (10,2005,27500)
One solution is:
SELECT TOP 1 A1.ProductID, A2.ProductID, A3.ProductID,
A1.Total+A2.Total+A3.Total as Total2005,
B1.Total+B2.Total+B3.Total as Total2004
FROM SalesTotals A1, SalesTotals A2, SalesTotals A3,
SalesTotals B1, SalesTotals B2, SalesTotals B3
WHERE A1.ProductID=B1.ProductID AND A2.ProductID=B2.ProductID AND
A3.ProductID=B3.ProductID
AND A1.ProductID<A2.ProductID AND A2.ProductID<A3.ProductID
AND A1.TheYear=2005 AND A2.TheYear=2005 AND A3.TheYear=2005
AND B1.TheYear=2004 AND B2.TheYear=2004 AND B3.TheYear=2004
AND B1.Total+B2.Total+B3.Total<10000
ORDER BY Total2005 DESC
For 3 products out of 10, this works instantaneous, but I would not
dare to think how much time and memory it would require for 10 products
out of 300.
The way you put it, this is the kind of problem that cannot be computed
in a reasonable time (this class of problems is called "NP-complete",
and this problem is called "knapsack problem").
The best thing you could do, is change the requirements, for example:
a) say "last year's sales for each product are less than 2,000" instead
of "the combined last year's sales are less than 10,000"
b) say "among the highest" instead of "highest"
For the second option, one algorithm that works in a reasonable time
is:
DECLARE @.Temp1 TABLE (
RowNum int NOT NULL IDENTITY UNIQUE,
ProductID int PRIMARY KEY,
Value2004 money NOT NULL,
Value2005 money NOT NULL
)
DECLARE @.Temp2 TABLE (
ProductID int PRIMARY KEY
)
INSERT INTO @.Temp1 (ProductID, Value2004, Value2005)
SELECT A.ProductID, B.Total, A.Total
FROM SalesTotals A INNER JOIN SalesTotals B
ON A.ProductID=B.ProductID AND A.TheYear=2005 AND B.TheYear=2004
ORDER BY A.Total-B.Total DESC, B.Total ASC
DECLARE @.P int, @.N tinyint, @.S money
SET @.N=0
SET @.S=0
WHILE @.N<3 BEGIN
SELECT TOP 1 @.P=ProductID, @.N=@.N+1, @.S=@.S+Value2004
FROM @.Temp1 WHERE @.S+Value2004<10000
AND ProductID NOT IN (SELECT ProductID FROM @.Temp2)
ORDER BY RowNum
INSERT INTO @.Temp2 VALUES (@.P)
END
SELECT ProductID, Value2004, Value2005 FROM @.Temp1
WHERE ProductID IN (SELECT ProductID FROM @.Temp2)
For this sample data, this algorithm returns the set {6, 8, 10}, which
has a total of 55,500, but the best set would have been {1, 7, 10} with
a total of 62,000. However, for other input data, this algorithm may
happen to guess the best set.
Razvan
PS. Again, I have written non-portable code, using Microsoft-specific
extensions... I apologize for doing this and I invite other posters to
write these queries in ANSI-SQL. At this time, I could not find a
ANSI-SQL solution that has the same or better performance in SQL Server
2000 as this non-portable solution. I'm not saying that it doesn't
exist, I just wasn't able to find it now.|||Greg,
To add to what Razvan said, this also sounds to me like a variation of
a knapsack problem, and while such problems have no known efficient
algorithms, a technique that can be useful is called "dynamic programming".
One optimistic detail is that you want the highest year-to-date sales. If
the secondary condition were not in place, this would be easy (find the 10
products with the highest year-to-date sales). You might get lucky if you
do something like this, which tries finding the answer within groups of 12
products with high YTD sales but low last-year sales, adjusting what
"low" means until something is found.
declare @.criterion int
set @.criterion = 5000 -- something <= 10000 to start
while 1=1 begin
select top 1 subset of size 10 from (
select top 12 ProductID
where last-year-sales < @.criterion
order by YTD-sales desc
) as Lucky
having sum(last-year-sales) < 10000
order by sum(YTD-sales) desc
if @.@.rowcount = 0 set @.criterion = @.criterion*0.95
else return what you found
if @.criterion < 1 break
end
Steve Kass
Drew University
Greg Toronto wrote:

>Hey gang,
>I have a question that may be more Math related than SQL Server related.
>But maybe it's a simple thing & I'm missing it 'cause I'm trying too hard..
.
>Here's the scenario: I have 300 products in a table. I also have last
>year's sales figures, and this year's to-date sales figures. I want to
>figure out which group of 10 products have the highest year-to-date sales A
ND
>last year's sales combined to less than 10,000.
>I have written a program that basically uses Where statements to iterate
>through the 300 items, 10 times, calculating the best sales for this year
>where last years sales are less than 10,000.
>The problem is that there are 300 * 299 * 298 * 297 * 296 * 295 * 294 * 293
>* 292 * 291 different combinations of products. That's a big freakin' numb
er!
>I've put in some minor checks to help speed things up slightly - things lik
e
>breaking where last years total is more than 10,000, etc. But she's still
>gonna take about 42,000 years to get my result.
>So what am I missing? What's the best way to accomplish what I'm trying to
>do?
>Any help would be appreciated...
>Thanks,
>Greg
>|||If you want the Products with the Highest CurrentYear Sales, which had
PriorYearSales < 10000, then this should work:
Select Top 10 ProductID
From (Select ProductID,
Sum(Case When SalesDate >= '20040101'
And SalesDate < '20050101'
ThenSalesAmt End) As PriorYearSales,
Sum(Case When SalesDate >= '20050101'
ThenSalesAmt End) As CurrYearSales
From SalesTable
Where SalesDate >= '20040101'
Group By ProductID) As Sales
Where PriorYearSales < 10000
Order By CurrYearSales Desc
"Greg Toronto" wrote:

> Hey gang,
> I have a question that may be more Math related than SQL Server related.
> But maybe it's a simple thing & I'm missing it 'cause I'm trying too hard.
.
> Here's the scenario: I have 300 products in a table. I also have last
> year's sales figures, and this year's to-date sales figures. I want to
> figure out which group of 10 products have the highest year-to-date sales
AND
> last year's sales combined to less than 10,000.
> I have written a program that basically uses Where statements to iterate
> through the 300 items, 10 times, calculating the best sales for this year
> where last years sales are less than 10,000.
> The problem is that there are 300 * 299 * 298 * 297 * 296 * 295 * 294 * 29
3
> * 292 * 291 different combinations of products. That's a big freakin' num
ber!
> I've put in some minor checks to help speed things up slightly - things li
ke
> breaking where last years total is more than 10,000, etc. But she's still
> gonna take about 42,000 years to get my result.
> So what am I missing? What's the best way to accomplish what I'm trying t
o
> do?
> Any help would be appreciated...
> Thanks,
> Greg|||>> I have 300 products in a table. I also have last year's sales
figures, and this year's to-date sales figures. I want to figure out
which group of 10 products have the highest year-to-date sales and last
year's sales combined to less than 10,000. <<
I am little on this one. One way to read this is that each of
the products in the basket of ten items had less than $10,000 in sales
last year; the other way is that the entire basket had less than
$10,000 in sales last year.
First way is fairly easy:
SELECT X.product_id, X.this_yr_amt
FROM (SELECT product_id
SUM (CASE WHEN sales_date
BETWEEN '2004-01-01'AND '2004-12-31'
THEN sales_amt ELSE 0.00 END),
SUM (CASE WHEN sales_date > '2004-12-31'
THEN sales_amt ELSE 0.00 END)
FROM Sales
GROUP BY product_id)
AS X(product_id, last_yr_amt, this_yr_amt)
WHERE X.last_yr_amt < 10000.00;
The second way is a screaming pain because it is a "Bin Packing
Problem", which are known to be NP-complete. Google it. Then we can
have ties for two baskets to consider. Then we can have to handle
baskets with less than ten items. What if no such baskets exist in the
data? It gets ugly fast, and I do mean NP fast.
In SQL because it is a set-oriented language, you tend to get the set
of ALL answers, which mean you have to try ALL combinations. In the
real world, getting the first workable answer and quitting is often
good enough.
I would put my X() derived table into a VIEW, use a Cursor and a back
tracking algorithm in a procedural language.
1) Grab the top ten sellers for this year; if two products are tied,
then use the one with the lowest sale for last year. Make a Greedy
start.
2) Check the basket total for last year and stop if it is < $10000.
3) If not, replace one of the basket items with the item on the list
which has the highest this_year_amt and a last_year_amt lower than the
item it is replacing.
4) Repeat until all 300 items have been inspected or until you get a
halt condition.
This is probably not optimal and it depends on step #3, rule for
replacement.|||Steve, why not just create an aggregate SubQuery, that calculates the
PriorYears Sales amnd the current Years sales, grouped on Product ID,
Select ProductID,
Sum(Case When SalesDate >= '20040101'
And SalesDate < '20050101'
ThenSalesAmt End) As PriorYearSales,
Sum(Case When SalesDate >= '20050101'
ThenSalesAmt End) As CurrYearSales
From SalesTable
Where SalesDate >= '20040101' -- Not even necessary
Group By ProductID
And Then Select from this subquery the top 10 Records after filtering on
PriorYearSales < 10000, and ordering By CurrYearSales
-- *******************
Select Top 10 ProductID
From (Select ProductID,
Sum(Case When SalesDate >= '20040101'
And SalesDate < '20050101'
ThenSalesAmt End) As PriorYearSales,
Sum(Case When SalesDate >= '20050101'
ThenSalesAmt End) As CurrYearSales
From SalesTable
Where SalesDate >= '20040101'
Group By ProductID) As Sales
Where PriorYearSales < 10000
Order By CurrYearSales Desc
"Steve Kass" wrote:

> Greg,
> To add to what Razvan said, this also sounds to me like a variation of
> a knapsack problem, and while such problems have no known efficient
> algorithms, a technique that can be useful is called "dynamic programming"
.
> One optimistic detail is that you want the highest year-to-date sales.
If
> the secondary condition were not in place, this would be easy (find the 10
> products with the highest year-to-date sales). You might get lucky if you
> do something like this, which tries finding the answer within groups of 12
> products with high YTD sales but low last-year sales, adjusting what
> "low" means until something is found.
> declare @.criterion int
> set @.criterion = 5000 -- something <= 10000 to start
> while 1=1 begin
> select top 1 subset of size 10 from (
> select top 12 ProductID
> where last-year-sales < @.criterion
> order by YTD-sales desc
> ) as Lucky
> having sum(last-year-sales) < 10000
> order by sum(YTD-sales) desc
> if @.@.rowcount = 0 set @.criterion = @.criterion*0.95
> else return what you found
> if @.criterion < 1 break
> end
> Steve Kass
> Drew University
>
> Greg Toronto wrote:
>
>

No comments:

Post a Comment