-
Notifications
You must be signed in to change notification settings - Fork 1.8k
feat(cubesql): Add XIRR
aggregate function
#9508
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Alex Qyoun-ae <4062971+MazterQyou@users.noreply.github.com>
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #9508 +/- ##
==========================================
- Coverage 83.89% 80.52% -3.38%
==========================================
Files 229 383 +154
Lines 83569 96892 +13323
Branches 0 2223 +2223
==========================================
+ Hits 70111 78018 +7907
- Misses 13458 18559 +5101
- Partials 0 315 +315
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍🏻 LGTM!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, left some notes, nothing critical
}; | ||
let signature = Signature::one_of( | ||
type_signatures, | ||
Volatility::Volatile, // due to the usage of [`f64::powf`] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it should be Immutable
- given same inputs it should generate same output, so it is OK to lift that calculation. It's safe to keep it Volatile
, it should just deny some optimizations in DF.
Non-determinism in floating point precision does not turn functions into voltaile:
BuiltinScalarFunction::Log10
or BuiltinScalarFunction::Cos
are Volatility::Immutable
, but implemented with regular f64::log10
and f64::cos
, which are non-deterministic due to different host and target platforms, compiler version and what not. It's not like now
or random
, where results are expected to be different.
https://doc.rust-lang.org/std/primitive.f64.html#method.log10
date_type.clone(), | ||
])); | ||
// Signatures with `initial_guess` argument; only [`DataType::Float64`] is accepted | ||
const INITIAL_GUESS_TYPE: DataType = DataType::Float64; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor nit - I'd lift INITIAL_GUESS_TYPE
closer to NUMERIC_TYPES
} | ||
|
||
fn update_batch(&mut self, values: &[ArrayRef]) -> Result<()> { | ||
let payments = cast(&values[0], &DataType::Float64)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that implementation always casts payments to Float64
would not be enough to just declare first argument as Float64
in signature, and rely on DF to insert coercion in calls?
Same for dates and on_errors
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mcheshkov DF would throw errors when using non-specified types in this case. If, say, on_error
could only accept Float64
, then specifying argument as 0
instead of 0.0
would produce Int32
and would error on logical plan building. I wanted to avoid forcing the user to explicitly cast all the arguments, especially with dates, considering Cube's time
type is Timestamp
.
With signature, there doesn't seem to be any automatic coercion, but I might be wrong.
self.add_pair(payment, date)?; | ||
} | ||
let values_len = values.len(); | ||
if values_len < 3 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor nit - feels like it could be a bit simpler with values.get(2).map(...)
Arc::new(|| Ok(Box::new(XirrAccumulator::new()))); | ||
let state_type: StateTypeFunction = Arc::new(|_| { | ||
Ok(Arc::new(vec![ | ||
DataType::List(Box::new(Field::new("item", DataType::Float64, true))), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's use different field names here for different components of state
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The item
field name seems to be static for DataType::List
in DF. Looking through the code, there doesn't seem to be any List
where the field name would differ; I believe different field names are for maps or similar structures.
for (payment, date) in payments.into_iter().zip(dates) { | ||
self.add_pair(payment, date)?; | ||
} | ||
let states_len = states.len(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to check different states
length here, it should always be 4, as returned from Accumulator::state()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you are correct. I also noticed this mistake when extending the function with optional arguments, and then forgot to fix it before publishing the PR. I'll get rid of those checks in some future chore.
const MAX_ITERATIONS: usize = 100; | ||
const TOLERANCE: f64 = 1e-6; | ||
const DEFAULT_INITIAL_GUESS: f64 = 0.1; | ||
let Some(min_date) = self.pairs.iter().map(|(_, date)| *date).min() else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor nit - IMO min_by_key()
looks nicer
} | ||
let rate_positive = 1.0 + rate_of_return; | ||
let denominator = rate_positive.powf(*year_difference); | ||
net_present_value += *payment / denominator; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think, this can be more accurate (and, probably, performant) with FMA: net_present_value = payment.mul_add(denominator.recip(), net_present_value);
IDK if LLVM can do this for us, but I think it would not.
And same for derivative_value
, but I'm not sure how to handle multiple multiplication
if *payment == 0.0 { | ||
continue; | ||
} | ||
let rate_positive = 1.0 + rate_of_return; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor nit - rate_positive
is same for all pairs, can be lifter out of inner loop
let rate_positive = 1.0 + rate_of_return; | ||
let denominator = rate_positive.powf(*year_difference); | ||
net_present_value += *payment / denominator; | ||
derivative_value -= *year_difference * *payment / denominator / rate_positive; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just an observation - *year_difference * *payment
is same for pair, and does not change between iterations, so we can trade memory for CPU here
Check List
Description of Changes Made
This PR adds support for
XIRR
aggregate function. Related test is included.