-
Notifications
You must be signed in to change notification settings - Fork 163
docs(search): add note about re-indexing when enabling Tika #2285
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
docs(search): add note about re-indexing when enabling Tika #2285
Conversation
|
| opencloud search index --all-spaces | ||
| ``` | ||
|
|
||
| > **Note:** The re-index command skips files whose modification time has not changed since they were last indexed. If you changed the extractor type (e.g., from `basic` to `tika`), you need to delete the existing search index first to force a full content re-extraction: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ScharfViktor Is that true? I think we need to verify this first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, that is true. I can reproduce it
This comment was marked as outdated.
This comment was marked as outdated.
| > **Note:** The re-index command skips files whose modification time has not changed since they were last indexed. If you changed the extractor type (e.g., from `basic` to `tika`), you need to delete the existing search index first to force a full content re-extraction: | ||
| > | ||
| > ```shell | ||
| > rm -rf $OC_BASE_DATA_PATH/search # default: /var/lib/opencloud/search |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@micbar maybe it is bug? I expect re-index without deleting /search
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@aduffeck can you clarify? You know the implementation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's the current behavior, yes. I consider that a bug.
A re-index should unconditionally rebuild the index for the space/all space in my opinion. Maybe it would be helpful to also have a command or flag for just "syncing" the index, i.e. picking up changes that haven't been indexed yet (the current behavior), but that shouldn't be the default behavior of the index command.



Description
Add notes to the search service README clarifying that:
opencloud search index --all-spacescommand skips files with unchanged modification timeRelated Issue
Motivation and Context
When users enable Tika on an existing instance, they expect full-text search to work for all files. However,
opencloud search index --all-spacesskips files already in the index (mtime-based check inservices/search/pkg/search/service.go), so the Tika extractor is never called for previously indexed files. This is undocumented and confusing.How Has This Been Tested?
services/search/pkg/search/service.go(IndexSpace method, mtime skip logic at line ~495)--forceflag exists in the CLI or protobuf definition (IndexSpaceRequest)Types of changes
Checklist: