Skip to content

added einops for embedding models and simplified accuracy description#4207

Open
dtrawins wants to merge 8 commits into
mainfrom
CVS-186324
Open

added einops for embedding models and simplified accuracy description#4207
dtrawins wants to merge 8 commits into
mainfrom
CVS-186324

Conversation

@dtrawins
Copy link
Copy Markdown
Collaborator

🛠 Summary

CVS-186324

🧪 Checklist

  • Unit tests added.
  • The documentation updated.
  • Change follows security best practices.
    ``

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Updates demo documentation around accuracy evaluation and model export, and adds a missing Python dependency (einops) needed by some embedding/export workflows.

Changes:

  • Simplifies continuous batching accuracy demo instructions by linking to other deployment demos and updates the VLM evaluation command.
  • Adds einops to the export-models demo Python requirements.
  • Replaces a long CLI help “Expected Output” block in the export-models README with a short compatibility note about transformers versions.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.

File Description
demos/continuous_batching/accuracy/README.md Simplifies server startup guidance (links to other demos) and adjusts VLM eval command; retains example outputs.
demos/common/export_models/requirements.txt Adds einops dependency to export-model requirements.
demos/common/export_models/README.md Removes verbose help output and adds a note about potential transformers version requirements.

Comment on lines 17 to +22
## Starting the model server

### With Docker
```bash
docker run -d --rm -p 8000:8000 -v $(pwd)/models:/workspace:ro openvino/model_server:latest --rest_port 8000 --config_path /workspace/config.json
```

### On Baremetal
```bash
ovms --rest_port 8000 --config_path ./models/config.json
```
Example of LLM and VLM models deployment is documented in other demos like
[Agentic usage for LLM models](../agentic_ai/README.md)
[Using VLM models](../vlm/README.md)
Comment on lines 70 to 74
python -m lmms_eval \
--model openai_compatible \
--model_args model_version=OpenGVLab/InternVL2_5-8B,max_retries=1 \
--model_args model_version=OpenVINO/InternVL2_5-8B_int4-ov,max_retries=1 \
--tasks mme,mmmu_val \
--batch_size 1 \
Comment thread demos/common/export_models/README.md
--enable_tool_guided_generation
Enables enforcing tool schema during generation. Requires setting tool_parser
```
> Note: Exporting some models might require different transformers version than specified in requirements.txt Check [supported models](https://openvinotoolkit.github.io/openvino.genai/docs/supported-models/). If custom transformers version is required, install it afterwards via `pip install transformers==<version>`
@@ -14,33 +14,17 @@ Install the framework via pip:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we want to check pip install command if no other command is checked?

sentencepiece # Required by: transformers`
torchvision
requests
einops
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alibaba model still wasn't exported:
python3 export_model.py embeddings_ov --source_model Alibaba-NLP/gte-large-en-v1.5 --extra_quantization_params "--library sentence_transformers" --weight-format fp16 --config_file_path models/config_all.json

RuntimeError: Couldn't get TorchScript module by tracing.
Exception:
index 2314885530818453536 is out of bounds for dimension 0 with size 16
Please check correctness of provided 'example_input'. Sometimes models can be converted in scripted mode, please try running conversion without 'example_input'.
You can also provide TorchScript module that you obtained yourself, please refer to PyTorch documentation: https://pytorch.org/tutorials/beginner/Intro_to_TorchScript_tutorial.html.
Traceback (most recent call last):
File "/opt/home/k8sworker/ngroza/test/model_server/demos/common/export_models/export_model.py", line 687, in
export_embeddings_model_ov(args['model_repository_path'], args['source_model'], args['model_name'], args['precision'], template_parameters, args['config_file_path'], args['truncate'])
File "/opt/home/k8sworker/ngroza/test/model_server/demos/common/export_models/export_model.py", line 520, in export_embeddings_model_ov
raise ValueError("Failed to export embeddings model", source_model)
ValueError: ('Failed to export embeddings model', 'Alibaba-NLP/gte-large-en-v1.5')

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that is one of the models that require transformers<5

python -m lmms_eval \
--model openai_compatible \
--model_args model_version=OpenGVLab/InternVL2_5-8B,max_retries=1 \
--model_args model_version=OpenVINO/InternVL2_5-8B_int4-ov,max_retries=1 \
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is no such model in OV collection: https://huggingface.co/OpenVINO/models?search=intern

@dtrawins dtrawins requested review from ngrozae and pgladkows May 19, 2026 09:03
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
@dtrawins dtrawins requested a review from mzegla May 20, 2026 09:48

Install the framework via pip:
```bash
```text
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't this break some CI automation for accuracy checks? @pgladkows

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, accuracy checking will be disabled in demos

```text
export OPENAI_BASE_URL=http://localhost:8000/v3
bfcl generate --model ovms-model-stream --test-category simple_python,multiple --temperature 0.0 --num-threads 100 -o --result-dir model_name_dir
bfcl generate --model ovms-model-stream --test-category simple_python,multiple,multi_turn_base --temperature 0.0 --num-threads 10 -o --result-dir model_name_dir
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't this be to much for a demo? time-wise, it will take much longer to execute with multi turn.
Also you only add it for streaming path - shouldn't we align unary as well if we choose to go with multi turn?

dest='dataset')
parser.add_argument('--embed_dim', type=int, default=None, help='Embedding dimension. Auto-detected if not provided.',
dest='embed_dim')
parser.add_argument('--max_tokens', type=int, default=999999, help='Max input tokens for truncation. default: 512',
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

default does not match help description

Comment thread demos/embeddings/README.md Outdated
Comment thread demos/embeddings/README.md Outdated
Co-authored-by: Trawinski, Dariusz <dariusz.trawinski@intel.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants