Skip to content

feat(cubesql): Add XIRR aggregate function #9508

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Apr 24, 2025
Merged

Conversation

MazterQyou
Copy link
Member

Check List

  • Tests have been run in packages where changes made if available
  • Linter has been run for changed code
  • Tests for the changes have been added if not covered yet
  • Docs have been added / updated if required

Description of Changes Made

This PR adds support for XIRR aggregate function. Related test is included.

Signed-off-by: Alex Qyoun-ae <4062971+MazterQyou@users.noreply.github.com>
@MazterQyou MazterQyou requested a review from a team as a code owner April 23, 2025 17:24
Copy link

codecov bot commented Apr 23, 2025

Codecov Report

Attention: Patch coverage is 75.78125% with 62 lines in your changes missing coverage. Please review.

Project coverage is 80.52%. Comparing base (1d15182) to head (bb5876d).
Report is 6 commits behind head on master.

Files with missing lines Patch % Lines
...l/cubesql/src/compile/engine/udf/extension/xirr.rs 72.56% 62 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #9508      +/-   ##
==========================================
- Coverage   83.89%   80.52%   -3.38%     
==========================================
  Files         229      383     +154     
  Lines       83569    96892   +13323     
  Branches        0     2223    +2223     
==========================================
+ Hits        70111    78018    +7907     
- Misses      13458    18559    +5101     
- Partials        0      315     +315     
Flag Coverage Δ
cube-backend 59.02% <ø> (?)
cubesql 83.87% <75.78%> (-0.03%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Member

@KSDaemon KSDaemon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍🏻 LGTM!

@ovr ovr requested review from srh and mcheshkov April 24, 2025 11:30
@MazterQyou MazterQyou merged commit c7fb71b into master Apr 24, 2025
74 checks passed
@MazterQyou MazterQyou deleted the cubesql/xirr-udaf branch April 24, 2025 14:01
Copy link
Member

@mcheshkov mcheshkov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, left some notes, nothing critical

};
let signature = Signature::one_of(
type_signatures,
Volatility::Volatile, // due to the usage of [`f64::powf`]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should be Immutable - given same inputs it should generate same output, so it is OK to lift that calculation. It's safe to keep it Volatile, it should just deny some optimizations in DF.

Non-determinism in floating point precision does not turn functions into voltaile:
BuiltinScalarFunction::Log10 or BuiltinScalarFunction::Cos are Volatility::Immutable, but implemented with regular f64::log10 and f64::cos, which are non-deterministic due to different host and target platforms, compiler version and what not. It's not like now or random, where results are expected to be different.

https://github.com/apache/datafusion/blob/5eb0968fb4b110c75cb560837807a2dad026bed3/datafusion/functions/src/math/mod.rs#L178-L185

https://github.com/apache/datafusion/blob/5eb0968fb4b110c75cb560837807a2dad026bed3/datafusion/functions/src/macros.rs#L183

https://github.com/apache/datafusion/blob/5eb0968fb4b110c75cb560837807a2dad026bed3/datafusion/functions/src/macros.rs#L228-L232

https://doc.rust-lang.org/std/primitive.f64.html#method.log10

date_type.clone(),
]));
// Signatures with `initial_guess` argument; only [`DataType::Float64`] is accepted
const INITIAL_GUESS_TYPE: DataType = DataType::Float64;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor nit - I'd lift INITIAL_GUESS_TYPE closer to NUMERIC_TYPES

}

fn update_batch(&mut self, values: &[ArrayRef]) -> Result<()> {
let payments = cast(&values[0], &DataType::Float64)?;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that implementation always casts payments to Float64 would not be enough to just declare first argument as Float64 in signature, and rely on DF to insert coercion in calls?

Same for dates and on_errors

Copy link
Member Author

@MazterQyou MazterQyou Apr 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mcheshkov DF would throw errors when using non-specified types in this case. If, say, on_error could only accept Float64, then specifying argument as 0 instead of 0.0 would produce Int32 and would error on logical plan building. I wanted to avoid forcing the user to explicitly cast all the arguments, especially with dates, considering Cube's time type is Timestamp.
With signature, there doesn't seem to be any automatic coercion, but I might be wrong.

self.add_pair(payment, date)?;
}
let values_len = values.len();
if values_len < 3 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor nit - feels like it could be a bit simpler with values.get(2).map(...)

Arc::new(|| Ok(Box::new(XirrAccumulator::new())));
let state_type: StateTypeFunction = Arc::new(|_| {
Ok(Arc::new(vec![
DataType::List(Box::new(Field::new("item", DataType::Float64, true))),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's use different field names here for different components of state

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The item field name seems to be static for DataType::List in DF. Looking through the code, there doesn't seem to be any List where the field name would differ; I believe different field names are for maps or similar structures.

for (payment, date) in payments.into_iter().zip(dates) {
self.add_pair(payment, date)?;
}
let states_len = states.len();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to check different states length here, it should always be 4, as returned from Accumulator::state()

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you are correct. I also noticed this mistake when extending the function with optional arguments, and then forgot to fix it before publishing the PR. I'll get rid of those checks in some future chore.

const MAX_ITERATIONS: usize = 100;
const TOLERANCE: f64 = 1e-6;
const DEFAULT_INITIAL_GUESS: f64 = 0.1;
let Some(min_date) = self.pairs.iter().map(|(_, date)| *date).min() else {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor nit - IMO min_by_key() looks nicer

}
let rate_positive = 1.0 + rate_of_return;
let denominator = rate_positive.powf(*year_difference);
net_present_value += *payment / denominator;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think, this can be more accurate (and, probably, performant) with FMA: net_present_value = payment.mul_add(denominator.recip(), net_present_value);
IDK if LLVM can do this for us, but I think it would not.

And same for derivative_value, but I'm not sure how to handle multiple multiplication

if *payment == 0.0 {
continue;
}
let rate_positive = 1.0 + rate_of_return;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor nit - rate_positive is same for all pairs, can be lifter out of inner loop

let rate_positive = 1.0 + rate_of_return;
let denominator = rate_positive.powf(*year_difference);
net_present_value += *payment / denominator;
derivative_value -= *year_difference * *payment / denominator / rate_positive;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just an observation - *year_difference * *payment is same for pair, and does not change between iterations, so we can trade memory for CPU here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants