How do I generate trade statistics for a CouchDB / Rails3 application?
My problem: I am trying to develop a web application for currency traders. The app allows traders to enter or upload information about their trades, and I want to calculate a wide range of statistics based on the user input.
Now, generally, I would use a relational database for this, but I have two requirements that are not suitable for a relational database, so I am trying to use couchdb. These two issues are: 1) First of all, I have a companion desktop application that will allow users to browse and replicate to a site using couchdb's excellent replication feature, and 2) I would like users to be able to define their own settings things to track your trade and generate results based on what they come in. The layout, the lesser nature of the couch, seems ideal here, but it can be more complex than it sounds. (I already know that the couch requires you to define views in advance,and so I was just planning to glue all the attributes in the array and then emit the array in the view and further process from there.)
What I do: Currently I just emit every couch trade entered by each user system and query with the system key to get an array of trades for each system. Just. I am not currently using the shrink function to calculate any statistics because I couldn't figure out how to get everything I needed without getting an overflow error.
Here is an example of lines that are emitted from the couch:
{"total_rows":134,"offset":0,"rows":[
{"id":"5b1dcd47221e160d8721feee4ccc64be",
"key":["80e40ba2fa43589d57ec3f1d19db41e6","2010/05/14 04:32:37 +0000"], null,
"doc":{
"_id":"5b1dcd47221e160d8721feee4ccc64be",
"_rev":"1-bc9fe763e2637694df47d6f5efb58e5b",
"couchrest-type":"Trade",
"system":"80e40ba2fa43589d57ec3f1d19db41e6",
"pair":"EUR/USD",
"direction":"Buy",
"entry":12600,
"exit":12700,
"stop_loss":12500,
"profit_target":12700,
"status":"Closed",
"slug":"101332132375",
"custom_tracking": [{"name":"signal", "value":"Pin Bar"}]
"updated_at":"2010/05/14 04:32:37 +0000",
"created_at":"2010/05/14 04:32:37 +0000",
"result":100}}
]}
In my rails 3 controller, I basically just populate an array of trades like the one above, and then fetch the relevant data into smaller arrays that I can calculate in my stats.
Here is my show action for the page where I want to display statistics and all trades:
def show
@trades = Trade.by_system(:startkey => [@system.id], :endkey => [@system.id, Time.now ])
@trades.each do |trade|
if trade.result > 0
@winning_trades << trade.result
elsif trade.result < 0
@losing_trades << trade.result
else
@breakeven_trades << trade.result
end
if trade.direction == "Buy"
@long_trades << trade.result
else
@short_trades << trade.result
end
if trade["custom_tracking"] != nil
@custom_tracking << {"result" => trade.result, "variables" => trade["custom_tracking"]}
end
end
end
I am omitting something else that is happening, but this is the essence of what I am doing. Then I calculate the material at the presentation layer to get some results:
<% winning_long_trades = @long_trades.reject {|trade| trade <= 0 } %>
<% winning_short_trades = @short_trades.reject {|trade| trade <= 0 } %>
<ul>
<li>Total Trades: <%= @trades.count %></li>
<li>Winners: <%= @winning_trades.size %></li>
<li>Biggest Winner (Pips): <%= @winning_trades.max %></li>
<li>Average Win(Pips): <%= @winning_trades.sum/@winning_trades.size %></li>
<li>Losers: <%= @losing_trades.size %></li>
<li>Biggest Loser (Pips): <%= @losing_trades.min %></li>
<li>Average Loss(Pips): <%= @losing_trades.sum/@losing_trades.size %></li>
<li>Breakeven Trades: <%= @breakeven_trades.size %></li>
<li>Long Trades: <%= @long_trades.size %></li>
<li>Winning Long Trades: <%= winning_long_trades.size %></li>
<li>Short Trades: <%= @short_trades.size %></li>
<li>Winning Short Trades: <%= winning_short_trades.size %></li>
<li>Total Pips: <%= @winning_trades.sum + @losing_trades.sum %></li>
<li>Win Rate (%): <%= @winning_trades.size/@trades.count.to_f * 100 %></li>
</ul>
This leads to the following results, which, besides a few things, is exactly what I want:
- Total trades: 134
- Winners: 70
- Biggest winner (pips): 1488
- Average Win (pips): 440
- Losers: 58
- Biggest loser (pips): -516
- Average loss (pips): -225
- Breakeven Trades: 6
- Long trades: 125
- Winning Long Trades: 67
- Short trades: 9
- Winning Short Trades: 3
- Total Pips: 17819
- Average rate (%): 52.23880597014925
What am I interested in? Finally, the factual questions. ... I am starting to be skeptical about how well this method would work when the user has 5,000 trades, not just 134 as in this example. I expect most users to only have somewhere under 200 per year, but some users may have several thousand deals per year. Probably no more than 5,000 a year. Everything seems to be working fine now, but page load times are already slightly different from my tastes. (About 800ms to generate a page according to the rails logs with about 250ms wasted at the view layer.) I will end up caching this page. I'm sure, but I still need to refresh the page every time the trade is refreshed and I can't afford it to be too slow. Soooo .....
- Is doing something similar here with couchdb's direct shortening function? My guess is that putting this on the couch might help with large datasets. I couldn't figure out how, but I guess that doesn't mean it's impossible. Any hints would be helpful if possible.
- Can I use the list function if a reduction was not available due to a reduction in restrictions? Are there any couchdb list functions available for this kind of calculation? Does anyone know if the list functions work? Any hints as to what this would look like for the type of computation I'm trying to achieve?
- I thought about other parameters, such as starting the calculations during the save of each trade or at night, if I had to and save the results in the statistics doc, which I could then query so that all processing was done ahead of time. I would like this to be the last one because then I cannot filter trades by time periods dynamically as I would like. (I want to have a slider that the user can show to only show bids for that time period using the start key and end key in couchdb if I can.)
- If I need to continue performing computations inside the rails application while viewing the page, what can I do to improve my current implementation. I am new to rails, couch and programming in general. I'm sure I could do something better here. Do I need to create an array for each stat or is there a better way to do it.
I guess I just really would like some advice on how to solve this problem. I want the page generation time to be minimal as I expect these to be some of the best selling pages. My gut is that I will need to dump the statistic to any couch, or run the statistics ahead of time when they are called, but I'm not sure.
Finally: As I mentioned above, one of the main reasons for using a couch is to allow users to define their own things to track every transaction. Getting the data to the couch is no problem, but how can I take the custom_tracking array and find out how many winning trades for each named tracking attribute. If anyone can give me any hints on the possibility of doing this, that would be great.
Thanks a bunch. It would be very helpful to help. Willing to fork out some $$$ if anyone wants to take on this problem for me. (Don't know if this is allowed on stack overflow or not.)
a source to share
First, if you want to provide statistics on all trades executed (by the user), then yes, reducing the number of views can be awesome. Since the minified result is actually saved to disk in the view index, this should be able to speed up your views significantly.
Second, the default reduce_limit in CouchDB is just limiting. If you know what you are doing, you should simply disable it and force the shortening functions to return additional information. (Remarkable, and I think this is what you are using, by default the abbreviation_limit has a fixed size limit for the returned JSON structure, but if the size of the returned structure is more or less constant in the number of rows processed, then the actual size is not really an issue - although it probably shouldn't be huge.)
I don't think using the list function will help here, but I don't have much experience with them.
I can probably help you further for money if you like, or just stop at #couchdb on freenode and ask a few questions. There is usually a fairly knowledgeable crowd.
a source to share